xref: /onnv-gate/usr/src/cmd/perl/5.8.4/distrib/pod/perlguts.pod (revision 0:68f95e015346)
1*0Sstevel@tonic-gate=head1 NAME
2*0Sstevel@tonic-gate
3*0Sstevel@tonic-gateperlguts - Introduction to the Perl API
4*0Sstevel@tonic-gate
5*0Sstevel@tonic-gate=head1 DESCRIPTION
6*0Sstevel@tonic-gate
7*0Sstevel@tonic-gateThis document attempts to describe how to use the Perl API, as well as
8*0Sstevel@tonic-gateto provide some info on the basic workings of the Perl core. It is far
9*0Sstevel@tonic-gatefrom complete and probably contains many errors. Please refer any
10*0Sstevel@tonic-gatequestions or comments to the author below.
11*0Sstevel@tonic-gate
12*0Sstevel@tonic-gate=head1 Variables
13*0Sstevel@tonic-gate
14*0Sstevel@tonic-gate=head2 Datatypes
15*0Sstevel@tonic-gate
16*0Sstevel@tonic-gatePerl has three typedefs that handle Perl's three main data types:
17*0Sstevel@tonic-gate
18*0Sstevel@tonic-gate    SV  Scalar Value
19*0Sstevel@tonic-gate    AV  Array Value
20*0Sstevel@tonic-gate    HV  Hash Value
21*0Sstevel@tonic-gate
22*0Sstevel@tonic-gateEach typedef has specific routines that manipulate the various data types.
23*0Sstevel@tonic-gate
24*0Sstevel@tonic-gate=head2 What is an "IV"?
25*0Sstevel@tonic-gate
26*0Sstevel@tonic-gatePerl uses a special typedef IV which is a simple signed integer type that is
27*0Sstevel@tonic-gateguaranteed to be large enough to hold a pointer (as well as an integer).
28*0Sstevel@tonic-gateAdditionally, there is the UV, which is simply an unsigned IV.
29*0Sstevel@tonic-gate
30*0Sstevel@tonic-gatePerl also uses two special typedefs, I32 and I16, which will always be at
31*0Sstevel@tonic-gateleast 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
32*0Sstevel@tonic-gateas well.)  They will usually be exactly 32 and 16 bits long, but on Crays
33*0Sstevel@tonic-gatethey will both be 64 bits.
34*0Sstevel@tonic-gate
35*0Sstevel@tonic-gate=head2 Working with SVs
36*0Sstevel@tonic-gate
37*0Sstevel@tonic-gateAn SV can be created and loaded with one command.  There are five types of
38*0Sstevel@tonic-gatevalues that can be loaded: an integer value (IV), an unsigned integer
39*0Sstevel@tonic-gatevalue (UV), a double (NV), a string (PV), and another scalar (SV).
40*0Sstevel@tonic-gate
41*0Sstevel@tonic-gateThe seven routines are:
42*0Sstevel@tonic-gate
43*0Sstevel@tonic-gate    SV*  newSViv(IV);
44*0Sstevel@tonic-gate    SV*  newSVuv(UV);
45*0Sstevel@tonic-gate    SV*  newSVnv(double);
46*0Sstevel@tonic-gate    SV*  newSVpv(const char*, STRLEN);
47*0Sstevel@tonic-gate    SV*  newSVpvn(const char*, STRLEN);
48*0Sstevel@tonic-gate    SV*  newSVpvf(const char*, ...);
49*0Sstevel@tonic-gate    SV*  newSVsv(SV*);
50*0Sstevel@tonic-gate
51*0Sstevel@tonic-gateC<STRLEN> is an integer type (Size_t, usually defined as size_t in
52*0Sstevel@tonic-gateF<config.h>) guaranteed to be large enough to represent the size of
53*0Sstevel@tonic-gateany string that perl can handle.
54*0Sstevel@tonic-gate
55*0Sstevel@tonic-gateIn the unlikely case of a SV requiring more complex initialisation, you
56*0Sstevel@tonic-gatecan create an empty SV with newSV(len).  If C<len> is 0 an empty SV of
57*0Sstevel@tonic-gatetype NULL is returned, else an SV of type PV is returned with len + 1 (for
58*0Sstevel@tonic-gatethe NUL) bytes of storage allocated, accessible via SvPVX.  In both cases
59*0Sstevel@tonic-gatethe SV has value undef.
60*0Sstevel@tonic-gate
61*0Sstevel@tonic-gate    SV *sv = newSV(0);   /* no storage allocated  */
62*0Sstevel@tonic-gate    SV *sv = newSV(10);  /* 10 (+1) bytes of uninitialised storage allocated  */
63*0Sstevel@tonic-gate
64*0Sstevel@tonic-gateTo change the value of an I<already-existing> SV, there are eight routines:
65*0Sstevel@tonic-gate
66*0Sstevel@tonic-gate    void  sv_setiv(SV*, IV);
67*0Sstevel@tonic-gate    void  sv_setuv(SV*, UV);
68*0Sstevel@tonic-gate    void  sv_setnv(SV*, double);
69*0Sstevel@tonic-gate    void  sv_setpv(SV*, const char*);
70*0Sstevel@tonic-gate    void  sv_setpvn(SV*, const char*, STRLEN)
71*0Sstevel@tonic-gate    void  sv_setpvf(SV*, const char*, ...);
72*0Sstevel@tonic-gate    void  sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
73*0Sstevel@tonic-gate    void  sv_setsv(SV*, SV*);
74*0Sstevel@tonic-gate
75*0Sstevel@tonic-gateNotice that you can choose to specify the length of the string to be
76*0Sstevel@tonic-gateassigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
77*0Sstevel@tonic-gateallow Perl to calculate the length by using C<sv_setpv> or by specifying
78*0Sstevel@tonic-gate0 as the second argument to C<newSVpv>.  Be warned, though, that Perl will
79*0Sstevel@tonic-gatedetermine the string's length by using C<strlen>, which depends on the
80*0Sstevel@tonic-gatestring terminating with a NUL character.
81*0Sstevel@tonic-gate
82*0Sstevel@tonic-gateThe arguments of C<sv_setpvf> are processed like C<sprintf>, and the
83*0Sstevel@tonic-gateformatted output becomes the value.
84*0Sstevel@tonic-gate
85*0Sstevel@tonic-gateC<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
86*0Sstevel@tonic-gateeither a pointer to a variable argument list or the address and length of
87*0Sstevel@tonic-gatean array of SVs.  The last argument points to a boolean; on return, if that
88*0Sstevel@tonic-gateboolean is true, then locale-specific information has been used to format
89*0Sstevel@tonic-gatethe string, and the string's contents are therefore untrustworthy (see
90*0Sstevel@tonic-gateL<perlsec>).  This pointer may be NULL if that information is not
91*0Sstevel@tonic-gateimportant.  Note that this function requires you to specify the length of
92*0Sstevel@tonic-gatethe format.
93*0Sstevel@tonic-gate
94*0Sstevel@tonic-gateThe C<sv_set*()> functions are not generic enough to operate on values
95*0Sstevel@tonic-gatethat have "magic".  See L<Magic Virtual Tables> later in this document.
96*0Sstevel@tonic-gate
97*0Sstevel@tonic-gateAll SVs that contain strings should be terminated with a NUL character.
98*0Sstevel@tonic-gateIf it is not NUL-terminated there is a risk of
99*0Sstevel@tonic-gatecore dumps and corruptions from code which passes the string to C
100*0Sstevel@tonic-gatefunctions or system calls which expect a NUL-terminated string.
101*0Sstevel@tonic-gatePerl's own functions typically add a trailing NUL for this reason.
102*0Sstevel@tonic-gateNevertheless, you should be very careful when you pass a string stored
103*0Sstevel@tonic-gatein an SV to a C function or system call.
104*0Sstevel@tonic-gate
105*0Sstevel@tonic-gateTo access the actual value that an SV points to, you can use the macros:
106*0Sstevel@tonic-gate
107*0Sstevel@tonic-gate    SvIV(SV*)
108*0Sstevel@tonic-gate    SvUV(SV*)
109*0Sstevel@tonic-gate    SvNV(SV*)
110*0Sstevel@tonic-gate    SvPV(SV*, STRLEN len)
111*0Sstevel@tonic-gate    SvPV_nolen(SV*)
112*0Sstevel@tonic-gate
113*0Sstevel@tonic-gatewhich will automatically coerce the actual scalar type into an IV, UV, double,
114*0Sstevel@tonic-gateor string.
115*0Sstevel@tonic-gate
116*0Sstevel@tonic-gateIn the C<SvPV> macro, the length of the string returned is placed into the
117*0Sstevel@tonic-gatevariable C<len> (this is a macro, so you do I<not> use C<&len>).  If you do
118*0Sstevel@tonic-gatenot care what the length of the data is, use the C<SvPV_nolen> macro.
119*0Sstevel@tonic-gateHistorically the C<SvPV> macro with the global variable C<PL_na> has been
120*0Sstevel@tonic-gateused in this case.  But that can be quite inefficient because C<PL_na> must
121*0Sstevel@tonic-gatebe accessed in thread-local storage in threaded Perl.  In any case, remember
122*0Sstevel@tonic-gatethat Perl allows arbitrary strings of data that may both contain NULs and
123*0Sstevel@tonic-gatemight not be terminated by a NUL.
124*0Sstevel@tonic-gate
125*0Sstevel@tonic-gateAlso remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
126*0Sstevel@tonic-gatelen);>. It might work with your compiler, but it won't work for everyone.
127*0Sstevel@tonic-gateBreak this sort of statement up into separate assignments:
128*0Sstevel@tonic-gate
129*0Sstevel@tonic-gate	SV *s;
130*0Sstevel@tonic-gate	STRLEN len;
131*0Sstevel@tonic-gate	char * ptr;
132*0Sstevel@tonic-gate	ptr = SvPV(s, len);
133*0Sstevel@tonic-gate	foo(ptr, len);
134*0Sstevel@tonic-gate
135*0Sstevel@tonic-gateIf you want to know if the scalar value is TRUE, you can use:
136*0Sstevel@tonic-gate
137*0Sstevel@tonic-gate    SvTRUE(SV*)
138*0Sstevel@tonic-gate
139*0Sstevel@tonic-gateAlthough Perl will automatically grow strings for you, if you need to force
140*0Sstevel@tonic-gatePerl to allocate more memory for your SV, you can use the macro
141*0Sstevel@tonic-gate
142*0Sstevel@tonic-gate    SvGROW(SV*, STRLEN newlen)
143*0Sstevel@tonic-gate
144*0Sstevel@tonic-gatewhich will determine if more memory needs to be allocated.  If so, it will
145*0Sstevel@tonic-gatecall the function C<sv_grow>.  Note that C<SvGROW> can only increase, not
146*0Sstevel@tonic-gatedecrease, the allocated memory of an SV and that it does not automatically
147*0Sstevel@tonic-gateadd a byte for the a trailing NUL (perl's own string functions typically do
148*0Sstevel@tonic-gateC<SvGROW(sv, len + 1)>).
149*0Sstevel@tonic-gate
150*0Sstevel@tonic-gateIf you have an SV and want to know what kind of data Perl thinks is stored
151*0Sstevel@tonic-gatein it, you can use the following macros to check the type of SV you have.
152*0Sstevel@tonic-gate
153*0Sstevel@tonic-gate    SvIOK(SV*)
154*0Sstevel@tonic-gate    SvNOK(SV*)
155*0Sstevel@tonic-gate    SvPOK(SV*)
156*0Sstevel@tonic-gate
157*0Sstevel@tonic-gateYou can get and set the current length of the string stored in an SV with
158*0Sstevel@tonic-gatethe following macros:
159*0Sstevel@tonic-gate
160*0Sstevel@tonic-gate    SvCUR(SV*)
161*0Sstevel@tonic-gate    SvCUR_set(SV*, I32 val)
162*0Sstevel@tonic-gate
163*0Sstevel@tonic-gateYou can also get a pointer to the end of the string stored in the SV
164*0Sstevel@tonic-gatewith the macro:
165*0Sstevel@tonic-gate
166*0Sstevel@tonic-gate    SvEND(SV*)
167*0Sstevel@tonic-gate
168*0Sstevel@tonic-gateBut note that these last three macros are valid only if C<SvPOK()> is true.
169*0Sstevel@tonic-gate
170*0Sstevel@tonic-gateIf you want to append something to the end of string stored in an C<SV*>,
171*0Sstevel@tonic-gateyou can use the following functions:
172*0Sstevel@tonic-gate
173*0Sstevel@tonic-gate    void  sv_catpv(SV*, const char*);
174*0Sstevel@tonic-gate    void  sv_catpvn(SV*, const char*, STRLEN);
175*0Sstevel@tonic-gate    void  sv_catpvf(SV*, const char*, ...);
176*0Sstevel@tonic-gate    void  sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
177*0Sstevel@tonic-gate    void  sv_catsv(SV*, SV*);
178*0Sstevel@tonic-gate
179*0Sstevel@tonic-gateThe first function calculates the length of the string to be appended by
180*0Sstevel@tonic-gateusing C<strlen>.  In the second, you specify the length of the string
181*0Sstevel@tonic-gateyourself.  The third function processes its arguments like C<sprintf> and
182*0Sstevel@tonic-gateappends the formatted output.  The fourth function works like C<vsprintf>.
183*0Sstevel@tonic-gateYou can specify the address and length of an array of SVs instead of the
184*0Sstevel@tonic-gateva_list argument. The fifth function extends the string stored in the first
185*0Sstevel@tonic-gateSV with the string stored in the second SV.  It also forces the second SV
186*0Sstevel@tonic-gateto be interpreted as a string.
187*0Sstevel@tonic-gate
188*0Sstevel@tonic-gateThe C<sv_cat*()> functions are not generic enough to operate on values that
189*0Sstevel@tonic-gatehave "magic".  See L<Magic Virtual Tables> later in this document.
190*0Sstevel@tonic-gate
191*0Sstevel@tonic-gateIf you know the name of a scalar variable, you can get a pointer to its SV
192*0Sstevel@tonic-gateby using the following:
193*0Sstevel@tonic-gate
194*0Sstevel@tonic-gate    SV*  get_sv("package::varname", FALSE);
195*0Sstevel@tonic-gate
196*0Sstevel@tonic-gateThis returns NULL if the variable does not exist.
197*0Sstevel@tonic-gate
198*0Sstevel@tonic-gateIf you want to know if this variable (or any other SV) is actually C<defined>,
199*0Sstevel@tonic-gateyou can call:
200*0Sstevel@tonic-gate
201*0Sstevel@tonic-gate    SvOK(SV*)
202*0Sstevel@tonic-gate
203*0Sstevel@tonic-gateThe scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
204*0Sstevel@tonic-gateIts address can be used whenever an C<SV*> is needed.
205*0Sstevel@tonic-gateHowever, you have to be careful when using C<&PL_sv_undef> as a value in AVs
206*0Sstevel@tonic-gateor HVs (see L<AVs, HVs and undefined values>).
207*0Sstevel@tonic-gate
208*0Sstevel@tonic-gateThere are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
209*0Sstevel@tonic-gateboolean TRUE and FALSE values, respectively.  Like C<PL_sv_undef>, their
210*0Sstevel@tonic-gateaddresses can be used whenever an C<SV*> is needed.
211*0Sstevel@tonic-gate
212*0Sstevel@tonic-gateDo not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
213*0Sstevel@tonic-gateTake this code:
214*0Sstevel@tonic-gate
215*0Sstevel@tonic-gate    SV* sv = (SV*) 0;
216*0Sstevel@tonic-gate    if (I-am-to-return-a-real-value) {
217*0Sstevel@tonic-gate            sv = sv_2mortal(newSViv(42));
218*0Sstevel@tonic-gate    }
219*0Sstevel@tonic-gate    sv_setsv(ST(0), sv);
220*0Sstevel@tonic-gate
221*0Sstevel@tonic-gateThis code tries to return a new SV (which contains the value 42) if it should
222*0Sstevel@tonic-gatereturn a real value, or undef otherwise.  Instead it has returned a NULL
223*0Sstevel@tonic-gatepointer which, somewhere down the line, will cause a segmentation violation,
224*0Sstevel@tonic-gatebus error, or just weird results.  Change the zero to C<&PL_sv_undef> in the
225*0Sstevel@tonic-gatefirst line and all will be well.
226*0Sstevel@tonic-gate
227*0Sstevel@tonic-gateTo free an SV that you've created, call C<SvREFCNT_dec(SV*)>.  Normally this
228*0Sstevel@tonic-gatecall is not necessary (see L<Reference Counts and Mortality>).
229*0Sstevel@tonic-gate
230*0Sstevel@tonic-gate=head2 Offsets
231*0Sstevel@tonic-gate
232*0Sstevel@tonic-gatePerl provides the function C<sv_chop> to efficiently remove characters
233*0Sstevel@tonic-gatefrom the beginning of a string; you give it an SV and a pointer to
234*0Sstevel@tonic-gatesomewhere inside the PV, and it discards everything before the
235*0Sstevel@tonic-gatepointer. The efficiency comes by means of a little hack: instead of
236*0Sstevel@tonic-gateactually removing the characters, C<sv_chop> sets the flag C<OOK>
237*0Sstevel@tonic-gate(offset OK) to signal to other functions that the offset hack is in
238*0Sstevel@tonic-gateeffect, and it puts the number of bytes chopped off into the IV field
239*0Sstevel@tonic-gateof the SV. It then moves the PV pointer (called C<SvPVX>) forward that
240*0Sstevel@tonic-gatemany bytes, and adjusts C<SvCUR> and C<SvLEN>.
241*0Sstevel@tonic-gate
242*0Sstevel@tonic-gateHence, at this point, the start of the buffer that we allocated lives
243*0Sstevel@tonic-gateat C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
244*0Sstevel@tonic-gateinto the middle of this allocated storage.
245*0Sstevel@tonic-gate
246*0Sstevel@tonic-gateThis is best demonstrated by example:
247*0Sstevel@tonic-gate
248*0Sstevel@tonic-gate  % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
249*0Sstevel@tonic-gate  SV = PVIV(0x8128450) at 0x81340f0
250*0Sstevel@tonic-gate    REFCNT = 1
251*0Sstevel@tonic-gate    FLAGS = (POK,OOK,pPOK)
252*0Sstevel@tonic-gate    IV = 1  (OFFSET)
253*0Sstevel@tonic-gate    PV = 0x8135781 ( "1" . ) "2345"\0
254*0Sstevel@tonic-gate    CUR = 4
255*0Sstevel@tonic-gate    LEN = 5
256*0Sstevel@tonic-gate
257*0Sstevel@tonic-gateHere the number of bytes chopped off (1) is put into IV, and
258*0Sstevel@tonic-gateC<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
259*0Sstevel@tonic-gateportion of the string between the "real" and the "fake" beginnings is
260*0Sstevel@tonic-gateshown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
261*0Sstevel@tonic-gatethe fake beginning, not the real one.
262*0Sstevel@tonic-gate
263*0Sstevel@tonic-gateSomething similar to the offset hack is performed on AVs to enable
264*0Sstevel@tonic-gateefficient shifting and splicing off the beginning of the array; while
265*0Sstevel@tonic-gateC<AvARRAY> points to the first element in the array that is visible from
266*0Sstevel@tonic-gatePerl, C<AvALLOC> points to the real start of the C array. These are
267*0Sstevel@tonic-gateusually the same, but a C<shift> operation can be carried out by
268*0Sstevel@tonic-gateincreasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
269*0Sstevel@tonic-gateAgain, the location of the real start of the C array only comes into
270*0Sstevel@tonic-gateplay when freeing the array. See C<av_shift> in F<av.c>.
271*0Sstevel@tonic-gate
272*0Sstevel@tonic-gate=head2 What's Really Stored in an SV?
273*0Sstevel@tonic-gate
274*0Sstevel@tonic-gateRecall that the usual method of determining the type of scalar you have is
275*0Sstevel@tonic-gateto use C<Sv*OK> macros.  Because a scalar can be both a number and a string,
276*0Sstevel@tonic-gateusually these macros will always return TRUE and calling the C<Sv*V>
277*0Sstevel@tonic-gatemacros will do the appropriate conversion of string to integer/double or
278*0Sstevel@tonic-gateinteger/double to string.
279*0Sstevel@tonic-gate
280*0Sstevel@tonic-gateIf you I<really> need to know if you have an integer, double, or string
281*0Sstevel@tonic-gatepointer in an SV, you can use the following three macros instead:
282*0Sstevel@tonic-gate
283*0Sstevel@tonic-gate    SvIOKp(SV*)
284*0Sstevel@tonic-gate    SvNOKp(SV*)
285*0Sstevel@tonic-gate    SvPOKp(SV*)
286*0Sstevel@tonic-gate
287*0Sstevel@tonic-gateThese will tell you if you truly have an integer, double, or string pointer
288*0Sstevel@tonic-gatestored in your SV.  The "p" stands for private.
289*0Sstevel@tonic-gate
290*0Sstevel@tonic-gateThe are various ways in which the private and public flags may differ.
291*0Sstevel@tonic-gateFor example, a tied SV may have a valid underlying value in the IV slot
292*0Sstevel@tonic-gate(so SvIOKp is true), but the data should be accessed via the FETCH
293*0Sstevel@tonic-gateroutine rather than directly, so SvIOK is false. Another is when
294*0Sstevel@tonic-gatenumeric conversion has occured and precision has been lost: only the
295*0Sstevel@tonic-gateprivate flag is set on 'lossy' values. So when an NV is converted to an
296*0Sstevel@tonic-gateIV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
297*0Sstevel@tonic-gate
298*0Sstevel@tonic-gateIn general, though, it's best to use the C<Sv*V> macros.
299*0Sstevel@tonic-gate
300*0Sstevel@tonic-gate=head2 Working with AVs
301*0Sstevel@tonic-gate
302*0Sstevel@tonic-gateThere are two ways to create and load an AV.  The first method creates an
303*0Sstevel@tonic-gateempty AV:
304*0Sstevel@tonic-gate
305*0Sstevel@tonic-gate    AV*  newAV();
306*0Sstevel@tonic-gate
307*0Sstevel@tonic-gateThe second method both creates the AV and initially populates it with SVs:
308*0Sstevel@tonic-gate
309*0Sstevel@tonic-gate    AV*  av_make(I32 num, SV **ptr);
310*0Sstevel@tonic-gate
311*0Sstevel@tonic-gateThe second argument points to an array containing C<num> C<SV*>'s.  Once the
312*0Sstevel@tonic-gateAV has been created, the SVs can be destroyed, if so desired.
313*0Sstevel@tonic-gate
314*0Sstevel@tonic-gateOnce the AV has been created, the following operations are possible on AVs:
315*0Sstevel@tonic-gate
316*0Sstevel@tonic-gate    void  av_push(AV*, SV*);
317*0Sstevel@tonic-gate    SV*   av_pop(AV*);
318*0Sstevel@tonic-gate    SV*   av_shift(AV*);
319*0Sstevel@tonic-gate    void  av_unshift(AV*, I32 num);
320*0Sstevel@tonic-gate
321*0Sstevel@tonic-gateThese should be familiar operations, with the exception of C<av_unshift>.
322*0Sstevel@tonic-gateThis routine adds C<num> elements at the front of the array with the C<undef>
323*0Sstevel@tonic-gatevalue.  You must then use C<av_store> (described below) to assign values
324*0Sstevel@tonic-gateto these new elements.
325*0Sstevel@tonic-gate
326*0Sstevel@tonic-gateHere are some other functions:
327*0Sstevel@tonic-gate
328*0Sstevel@tonic-gate    I32   av_len(AV*);
329*0Sstevel@tonic-gate    SV**  av_fetch(AV*, I32 key, I32 lval);
330*0Sstevel@tonic-gate    SV**  av_store(AV*, I32 key, SV* val);
331*0Sstevel@tonic-gate
332*0Sstevel@tonic-gateThe C<av_len> function returns the highest index value in array (just
333*0Sstevel@tonic-gatelike $#array in Perl).  If the array is empty, -1 is returned.  The
334*0Sstevel@tonic-gateC<av_fetch> function returns the value at index C<key>, but if C<lval>
335*0Sstevel@tonic-gateis non-zero, then C<av_fetch> will store an undef value at that index.
336*0Sstevel@tonic-gateThe C<av_store> function stores the value C<val> at index C<key>, and does
337*0Sstevel@tonic-gatenot increment the reference count of C<val>.  Thus the caller is responsible
338*0Sstevel@tonic-gatefor taking care of that, and if C<av_store> returns NULL, the caller will
339*0Sstevel@tonic-gatehave to decrement the reference count to avoid a memory leak.  Note that
340*0Sstevel@tonic-gateC<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
341*0Sstevel@tonic-gatereturn value.
342*0Sstevel@tonic-gate
343*0Sstevel@tonic-gate    void  av_clear(AV*);
344*0Sstevel@tonic-gate    void  av_undef(AV*);
345*0Sstevel@tonic-gate    void  av_extend(AV*, I32 key);
346*0Sstevel@tonic-gate
347*0Sstevel@tonic-gateThe C<av_clear> function deletes all the elements in the AV* array, but
348*0Sstevel@tonic-gatedoes not actually delete the array itself.  The C<av_undef> function will
349*0Sstevel@tonic-gatedelete all the elements in the array plus the array itself.  The
350*0Sstevel@tonic-gateC<av_extend> function extends the array so that it contains at least C<key+1>
351*0Sstevel@tonic-gateelements.  If C<key+1> is less than the currently allocated length of the array,
352*0Sstevel@tonic-gatethen nothing is done.
353*0Sstevel@tonic-gate
354*0Sstevel@tonic-gateIf you know the name of an array variable, you can get a pointer to its AV
355*0Sstevel@tonic-gateby using the following:
356*0Sstevel@tonic-gate
357*0Sstevel@tonic-gate    AV*  get_av("package::varname", FALSE);
358*0Sstevel@tonic-gate
359*0Sstevel@tonic-gateThis returns NULL if the variable does not exist.
360*0Sstevel@tonic-gate
361*0Sstevel@tonic-gateSee L<Understanding the Magic of Tied Hashes and Arrays> for more
362*0Sstevel@tonic-gateinformation on how to use the array access functions on tied arrays.
363*0Sstevel@tonic-gate
364*0Sstevel@tonic-gate=head2 Working with HVs
365*0Sstevel@tonic-gate
366*0Sstevel@tonic-gateTo create an HV, you use the following routine:
367*0Sstevel@tonic-gate
368*0Sstevel@tonic-gate    HV*  newHV();
369*0Sstevel@tonic-gate
370*0Sstevel@tonic-gateOnce the HV has been created, the following operations are possible on HVs:
371*0Sstevel@tonic-gate
372*0Sstevel@tonic-gate    SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
373*0Sstevel@tonic-gate    SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);
374*0Sstevel@tonic-gate
375*0Sstevel@tonic-gateThe C<klen> parameter is the length of the key being passed in (Note that
376*0Sstevel@tonic-gateyou cannot pass 0 in as a value of C<klen> to tell Perl to measure the
377*0Sstevel@tonic-gatelength of the key).  The C<val> argument contains the SV pointer to the
378*0Sstevel@tonic-gatescalar being stored, and C<hash> is the precomputed hash value (zero if
379*0Sstevel@tonic-gateyou want C<hv_store> to calculate it for you).  The C<lval> parameter
380*0Sstevel@tonic-gateindicates whether this fetch is actually a part of a store operation, in
381*0Sstevel@tonic-gatewhich case a new undefined value will be added to the HV with the supplied
382*0Sstevel@tonic-gatekey and C<hv_fetch> will return as if the value had already existed.
383*0Sstevel@tonic-gate
384*0Sstevel@tonic-gateRemember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
385*0Sstevel@tonic-gateC<SV*>.  To access the scalar value, you must first dereference the return
386*0Sstevel@tonic-gatevalue.  However, you should check to make sure that the return value is
387*0Sstevel@tonic-gatenot NULL before dereferencing it.
388*0Sstevel@tonic-gate
389*0Sstevel@tonic-gateThese two functions check if a hash table entry exists, and deletes it.
390*0Sstevel@tonic-gate
391*0Sstevel@tonic-gate    bool  hv_exists(HV*, const char* key, U32 klen);
392*0Sstevel@tonic-gate    SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);
393*0Sstevel@tonic-gate
394*0Sstevel@tonic-gateIf C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
395*0Sstevel@tonic-gatecreate and return a mortal copy of the deleted value.
396*0Sstevel@tonic-gate
397*0Sstevel@tonic-gateAnd more miscellaneous functions:
398*0Sstevel@tonic-gate
399*0Sstevel@tonic-gate    void   hv_clear(HV*);
400*0Sstevel@tonic-gate    void   hv_undef(HV*);
401*0Sstevel@tonic-gate
402*0Sstevel@tonic-gateLike their AV counterparts, C<hv_clear> deletes all the entries in the hash
403*0Sstevel@tonic-gatetable but does not actually delete the hash table.  The C<hv_undef> deletes
404*0Sstevel@tonic-gateboth the entries and the hash table itself.
405*0Sstevel@tonic-gate
406*0Sstevel@tonic-gatePerl keeps the actual data in linked list of structures with a typedef of HE.
407*0Sstevel@tonic-gateThese contain the actual key and value pointers (plus extra administrative
408*0Sstevel@tonic-gateoverhead).  The key is a string pointer; the value is an C<SV*>.  However,
409*0Sstevel@tonic-gateonce you have an C<HE*>, to get the actual key and value, use the routines
410*0Sstevel@tonic-gatespecified below.
411*0Sstevel@tonic-gate
412*0Sstevel@tonic-gate    I32    hv_iterinit(HV*);
413*0Sstevel@tonic-gate            /* Prepares starting point to traverse hash table */
414*0Sstevel@tonic-gate    HE*    hv_iternext(HV*);
415*0Sstevel@tonic-gate            /* Get the next entry, and return a pointer to a
416*0Sstevel@tonic-gate               structure that has both the key and value */
417*0Sstevel@tonic-gate    char*  hv_iterkey(HE* entry, I32* retlen);
418*0Sstevel@tonic-gate            /* Get the key from an HE structure and also return
419*0Sstevel@tonic-gate               the length of the key string */
420*0Sstevel@tonic-gate    SV*    hv_iterval(HV*, HE* entry);
421*0Sstevel@tonic-gate            /* Return an SV pointer to the value of the HE
422*0Sstevel@tonic-gate               structure */
423*0Sstevel@tonic-gate    SV*    hv_iternextsv(HV*, char** key, I32* retlen);
424*0Sstevel@tonic-gate            /* This convenience routine combines hv_iternext,
425*0Sstevel@tonic-gate	       hv_iterkey, and hv_iterval.  The key and retlen
426*0Sstevel@tonic-gate	       arguments are return values for the key and its
427*0Sstevel@tonic-gate	       length.  The value is returned in the SV* argument */
428*0Sstevel@tonic-gate
429*0Sstevel@tonic-gateIf you know the name of a hash variable, you can get a pointer to its HV
430*0Sstevel@tonic-gateby using the following:
431*0Sstevel@tonic-gate
432*0Sstevel@tonic-gate    HV*  get_hv("package::varname", FALSE);
433*0Sstevel@tonic-gate
434*0Sstevel@tonic-gateThis returns NULL if the variable does not exist.
435*0Sstevel@tonic-gate
436*0Sstevel@tonic-gateThe hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
437*0Sstevel@tonic-gate
438*0Sstevel@tonic-gate    hash = 0;
439*0Sstevel@tonic-gate    while (klen--)
440*0Sstevel@tonic-gate	hash = (hash * 33) + *key++;
441*0Sstevel@tonic-gate    hash = hash + (hash >> 5);			/* after 5.6 */
442*0Sstevel@tonic-gate
443*0Sstevel@tonic-gateThe last step was added in version 5.6 to improve distribution of
444*0Sstevel@tonic-gatelower bits in the resulting hash value.
445*0Sstevel@tonic-gate
446*0Sstevel@tonic-gateSee L<Understanding the Magic of Tied Hashes and Arrays> for more
447*0Sstevel@tonic-gateinformation on how to use the hash access functions on tied hashes.
448*0Sstevel@tonic-gate
449*0Sstevel@tonic-gate=head2 Hash API Extensions
450*0Sstevel@tonic-gate
451*0Sstevel@tonic-gateBeginning with version 5.004, the following functions are also supported:
452*0Sstevel@tonic-gate
453*0Sstevel@tonic-gate    HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
454*0Sstevel@tonic-gate    HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);
455*0Sstevel@tonic-gate
456*0Sstevel@tonic-gate    bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
457*0Sstevel@tonic-gate    SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
458*0Sstevel@tonic-gate
459*0Sstevel@tonic-gate    SV*     hv_iterkeysv  (HE* entry);
460*0Sstevel@tonic-gate
461*0Sstevel@tonic-gateNote that these functions take C<SV*> keys, which simplifies writing
462*0Sstevel@tonic-gateof extension code that deals with hash structures.  These functions
463*0Sstevel@tonic-gatealso allow passing of C<SV*> keys to C<tie> functions without forcing
464*0Sstevel@tonic-gateyou to stringify the keys (unlike the previous set of functions).
465*0Sstevel@tonic-gate
466*0Sstevel@tonic-gateThey also return and accept whole hash entries (C<HE*>), making their
467*0Sstevel@tonic-gateuse more efficient (since the hash number for a particular string
468*0Sstevel@tonic-gatedoesn't have to be recomputed every time).  See L<perlapi> for detailed
469*0Sstevel@tonic-gatedescriptions.
470*0Sstevel@tonic-gate
471*0Sstevel@tonic-gateThe following macros must always be used to access the contents of hash
472*0Sstevel@tonic-gateentries.  Note that the arguments to these macros must be simple
473*0Sstevel@tonic-gatevariables, since they may get evaluated more than once.  See
474*0Sstevel@tonic-gateL<perlapi> for detailed descriptions of these macros.
475*0Sstevel@tonic-gate
476*0Sstevel@tonic-gate    HePV(HE* he, STRLEN len)
477*0Sstevel@tonic-gate    HeVAL(HE* he)
478*0Sstevel@tonic-gate    HeHASH(HE* he)
479*0Sstevel@tonic-gate    HeSVKEY(HE* he)
480*0Sstevel@tonic-gate    HeSVKEY_force(HE* he)
481*0Sstevel@tonic-gate    HeSVKEY_set(HE* he, SV* sv)
482*0Sstevel@tonic-gate
483*0Sstevel@tonic-gateThese two lower level macros are defined, but must only be used when
484*0Sstevel@tonic-gatedealing with keys that are not C<SV*>s:
485*0Sstevel@tonic-gate
486*0Sstevel@tonic-gate    HeKEY(HE* he)
487*0Sstevel@tonic-gate    HeKLEN(HE* he)
488*0Sstevel@tonic-gate
489*0Sstevel@tonic-gateNote that both C<hv_store> and C<hv_store_ent> do not increment the
490*0Sstevel@tonic-gatereference count of the stored C<val>, which is the caller's responsibility.
491*0Sstevel@tonic-gateIf these functions return a NULL value, the caller will usually have to
492*0Sstevel@tonic-gatedecrement the reference count of C<val> to avoid a memory leak.
493*0Sstevel@tonic-gate
494*0Sstevel@tonic-gate=head2 AVs, HVs and undefined values
495*0Sstevel@tonic-gate
496*0Sstevel@tonic-gateSometimes you have to store undefined values in AVs or HVs. Although
497*0Sstevel@tonic-gatethis may be a rare case, it can be tricky. That's because you're
498*0Sstevel@tonic-gateused to using C<&PL_sv_undef> if you need an undefined SV.
499*0Sstevel@tonic-gate
500*0Sstevel@tonic-gateFor example, intuition tells you that this XS code:
501*0Sstevel@tonic-gate
502*0Sstevel@tonic-gate    AV *av = newAV();
503*0Sstevel@tonic-gate    av_store( av, 0, &PL_sv_undef );
504*0Sstevel@tonic-gate
505*0Sstevel@tonic-gateis equivalent to this Perl code:
506*0Sstevel@tonic-gate
507*0Sstevel@tonic-gate    my @av;
508*0Sstevel@tonic-gate    $av[0] = undef;
509*0Sstevel@tonic-gate
510*0Sstevel@tonic-gateUnfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker
511*0Sstevel@tonic-gatefor indicating that an array element has not yet been initialized.
512*0Sstevel@tonic-gateThus, C<exists $av[0]> would be true for the above Perl code, but
513*0Sstevel@tonic-gatefalse for the array generated by the XS code.
514*0Sstevel@tonic-gate
515*0Sstevel@tonic-gateOther problems can occur when storing C<&PL_sv_undef> in HVs:
516*0Sstevel@tonic-gate
517*0Sstevel@tonic-gate    hv_store( hv, "key", 3, &PL_sv_undef, 0 );
518*0Sstevel@tonic-gate
519*0Sstevel@tonic-gateThis will indeed make the value C<undef>, but if you try to modify
520*0Sstevel@tonic-gatethe value of C<key>, you'll get the following error:
521*0Sstevel@tonic-gate
522*0Sstevel@tonic-gate    Modification of non-creatable hash value attempted
523*0Sstevel@tonic-gate
524*0Sstevel@tonic-gateIn perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
525*0Sstevel@tonic-gatein restricted hashes. This caused such hash entries not to appear
526*0Sstevel@tonic-gatewhen iterating over the hash or when checking for the keys
527*0Sstevel@tonic-gatewith the C<hv_exists> function.
528*0Sstevel@tonic-gate
529*0Sstevel@tonic-gateYou can run into similar problems when you store C<&PL_sv_true> or
530*0Sstevel@tonic-gateC<&PL_sv_false> into AVs or HVs. Trying to modify such elements
531*0Sstevel@tonic-gatewill give you the following error:
532*0Sstevel@tonic-gate
533*0Sstevel@tonic-gate    Modification of a read-only value attempted
534*0Sstevel@tonic-gate
535*0Sstevel@tonic-gateTo make a long story short, you can use the special variables
536*0Sstevel@tonic-gateC<&PL_sv_undef>, C<&PL_sv_true> and C<&PL_sv_false> with AVs and
537*0Sstevel@tonic-gateHVs, but you have to make sure you know what you're doing.
538*0Sstevel@tonic-gate
539*0Sstevel@tonic-gateGenerally, if you want to store an undefined value in an AV
540*0Sstevel@tonic-gateor HV, you should not use C<&PL_sv_undef>, but rather create a
541*0Sstevel@tonic-gatenew undefined value using the C<newSV> function, for example:
542*0Sstevel@tonic-gate
543*0Sstevel@tonic-gate    av_store( av, 42, newSV(0) );
544*0Sstevel@tonic-gate    hv_store( hv, "foo", 3, newSV(0), 0 );
545*0Sstevel@tonic-gate
546*0Sstevel@tonic-gate=head2 References
547*0Sstevel@tonic-gate
548*0Sstevel@tonic-gateReferences are a special type of scalar that point to other data types
549*0Sstevel@tonic-gate(including references).
550*0Sstevel@tonic-gate
551*0Sstevel@tonic-gateTo create a reference, use either of the following functions:
552*0Sstevel@tonic-gate
553*0Sstevel@tonic-gate    SV* newRV_inc((SV*) thing);
554*0Sstevel@tonic-gate    SV* newRV_noinc((SV*) thing);
555*0Sstevel@tonic-gate
556*0Sstevel@tonic-gateThe C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>.  The
557*0Sstevel@tonic-gatefunctions are identical except that C<newRV_inc> increments the reference
558*0Sstevel@tonic-gatecount of the C<thing>, while C<newRV_noinc> does not.  For historical
559*0Sstevel@tonic-gatereasons, C<newRV> is a synonym for C<newRV_inc>.
560*0Sstevel@tonic-gate
561*0Sstevel@tonic-gateOnce you have a reference, you can use the following macro to dereference
562*0Sstevel@tonic-gatethe reference:
563*0Sstevel@tonic-gate
564*0Sstevel@tonic-gate    SvRV(SV*)
565*0Sstevel@tonic-gate
566*0Sstevel@tonic-gatethen call the appropriate routines, casting the returned C<SV*> to either an
567*0Sstevel@tonic-gateC<AV*> or C<HV*>, if required.
568*0Sstevel@tonic-gate
569*0Sstevel@tonic-gateTo determine if an SV is a reference, you can use the following macro:
570*0Sstevel@tonic-gate
571*0Sstevel@tonic-gate    SvROK(SV*)
572*0Sstevel@tonic-gate
573*0Sstevel@tonic-gateTo discover what type of value the reference refers to, use the following
574*0Sstevel@tonic-gatemacro and then check the return value.
575*0Sstevel@tonic-gate
576*0Sstevel@tonic-gate    SvTYPE(SvRV(SV*))
577*0Sstevel@tonic-gate
578*0Sstevel@tonic-gateThe most useful types that will be returned are:
579*0Sstevel@tonic-gate
580*0Sstevel@tonic-gate    SVt_IV    Scalar
581*0Sstevel@tonic-gate    SVt_NV    Scalar
582*0Sstevel@tonic-gate    SVt_PV    Scalar
583*0Sstevel@tonic-gate    SVt_RV    Scalar
584*0Sstevel@tonic-gate    SVt_PVAV  Array
585*0Sstevel@tonic-gate    SVt_PVHV  Hash
586*0Sstevel@tonic-gate    SVt_PVCV  Code
587*0Sstevel@tonic-gate    SVt_PVGV  Glob (possible a file handle)
588*0Sstevel@tonic-gate    SVt_PVMG  Blessed or Magical Scalar
589*0Sstevel@tonic-gate
590*0Sstevel@tonic-gate    See the sv.h header file for more details.
591*0Sstevel@tonic-gate
592*0Sstevel@tonic-gate=head2 Blessed References and Class Objects
593*0Sstevel@tonic-gate
594*0Sstevel@tonic-gateReferences are also used to support object-oriented programming.  In perl's
595*0Sstevel@tonic-gateOO lexicon, an object is simply a reference that has been blessed into a
596*0Sstevel@tonic-gatepackage (or class).  Once blessed, the programmer may now use the reference
597*0Sstevel@tonic-gateto access the various methods in the class.
598*0Sstevel@tonic-gate
599*0Sstevel@tonic-gateA reference can be blessed into a package with the following function:
600*0Sstevel@tonic-gate
601*0Sstevel@tonic-gate    SV* sv_bless(SV* sv, HV* stash);
602*0Sstevel@tonic-gate
603*0Sstevel@tonic-gateThe C<sv> argument must be a reference value.  The C<stash> argument
604*0Sstevel@tonic-gatespecifies which class the reference will belong to.  See
605*0Sstevel@tonic-gateL<Stashes and Globs> for information on converting class names into stashes.
606*0Sstevel@tonic-gate
607*0Sstevel@tonic-gate/* Still under construction */
608*0Sstevel@tonic-gate
609*0Sstevel@tonic-gateUpgrades rv to reference if not already one.  Creates new SV for rv to
610*0Sstevel@tonic-gatepoint to.  If C<classname> is non-null, the SV is blessed into the specified
611*0Sstevel@tonic-gateclass.  SV is returned.
612*0Sstevel@tonic-gate
613*0Sstevel@tonic-gate	SV* newSVrv(SV* rv, const char* classname);
614*0Sstevel@tonic-gate
615*0Sstevel@tonic-gateCopies integer, unsigned integer or double into an SV whose reference is C<rv>.  SV is blessed
616*0Sstevel@tonic-gateif C<classname> is non-null.
617*0Sstevel@tonic-gate
618*0Sstevel@tonic-gate	SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
619*0Sstevel@tonic-gate	SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
620*0Sstevel@tonic-gate	SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
621*0Sstevel@tonic-gate
622*0Sstevel@tonic-gateCopies the pointer value (I<the address, not the string!>) into an SV whose
623*0Sstevel@tonic-gatereference is rv.  SV is blessed if C<classname> is non-null.
624*0Sstevel@tonic-gate
625*0Sstevel@tonic-gate	SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
626*0Sstevel@tonic-gate
627*0Sstevel@tonic-gateCopies string into an SV whose reference is C<rv>.  Set length to 0 to let
628*0Sstevel@tonic-gatePerl calculate the string length.  SV is blessed if C<classname> is non-null.
629*0Sstevel@tonic-gate
630*0Sstevel@tonic-gate	SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
631*0Sstevel@tonic-gate
632*0Sstevel@tonic-gateTests whether the SV is blessed into the specified class.  It does not
633*0Sstevel@tonic-gatecheck inheritance relationships.
634*0Sstevel@tonic-gate
635*0Sstevel@tonic-gate	int  sv_isa(SV* sv, const char* name);
636*0Sstevel@tonic-gate
637*0Sstevel@tonic-gateTests whether the SV is a reference to a blessed object.
638*0Sstevel@tonic-gate
639*0Sstevel@tonic-gate	int  sv_isobject(SV* sv);
640*0Sstevel@tonic-gate
641*0Sstevel@tonic-gateTests whether the SV is derived from the specified class. SV can be either
642*0Sstevel@tonic-gatea reference to a blessed object or a string containing a class name. This
643*0Sstevel@tonic-gateis the function implementing the C<UNIVERSAL::isa> functionality.
644*0Sstevel@tonic-gate
645*0Sstevel@tonic-gate	bool sv_derived_from(SV* sv, const char* name);
646*0Sstevel@tonic-gate
647*0Sstevel@tonic-gateTo check if you've got an object derived from a specific class you have
648*0Sstevel@tonic-gateto write:
649*0Sstevel@tonic-gate
650*0Sstevel@tonic-gate	if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
651*0Sstevel@tonic-gate
652*0Sstevel@tonic-gate=head2 Creating New Variables
653*0Sstevel@tonic-gate
654*0Sstevel@tonic-gateTo create a new Perl variable with an undef value which can be accessed from
655*0Sstevel@tonic-gateyour Perl script, use the following routines, depending on the variable type.
656*0Sstevel@tonic-gate
657*0Sstevel@tonic-gate    SV*  get_sv("package::varname", TRUE);
658*0Sstevel@tonic-gate    AV*  get_av("package::varname", TRUE);
659*0Sstevel@tonic-gate    HV*  get_hv("package::varname", TRUE);
660*0Sstevel@tonic-gate
661*0Sstevel@tonic-gateNotice the use of TRUE as the second parameter.  The new variable can now
662*0Sstevel@tonic-gatebe set, using the routines appropriate to the data type.
663*0Sstevel@tonic-gate
664*0Sstevel@tonic-gateThere are additional macros whose values may be bitwise OR'ed with the
665*0Sstevel@tonic-gateC<TRUE> argument to enable certain extra features.  Those bits are:
666*0Sstevel@tonic-gate
667*0Sstevel@tonic-gate=over
668*0Sstevel@tonic-gate
669*0Sstevel@tonic-gate=item GV_ADDMULTI
670*0Sstevel@tonic-gate
671*0Sstevel@tonic-gateMarks the variable as multiply defined, thus preventing the:
672*0Sstevel@tonic-gate
673*0Sstevel@tonic-gate  Name <varname> used only once: possible typo
674*0Sstevel@tonic-gate
675*0Sstevel@tonic-gatewarning.
676*0Sstevel@tonic-gate
677*0Sstevel@tonic-gate=item GV_ADDWARN
678*0Sstevel@tonic-gate
679*0Sstevel@tonic-gateIssues the warning:
680*0Sstevel@tonic-gate
681*0Sstevel@tonic-gate  Had to create <varname> unexpectedly
682*0Sstevel@tonic-gate
683*0Sstevel@tonic-gateif the variable did not exist before the function was called.
684*0Sstevel@tonic-gate
685*0Sstevel@tonic-gate=back
686*0Sstevel@tonic-gate
687*0Sstevel@tonic-gateIf you do not specify a package name, the variable is created in the current
688*0Sstevel@tonic-gatepackage.
689*0Sstevel@tonic-gate
690*0Sstevel@tonic-gate=head2 Reference Counts and Mortality
691*0Sstevel@tonic-gate
692*0Sstevel@tonic-gatePerl uses a reference count-driven garbage collection mechanism. SVs,
693*0Sstevel@tonic-gateAVs, or HVs (xV for short in the following) start their life with a
694*0Sstevel@tonic-gatereference count of 1.  If the reference count of an xV ever drops to 0,
695*0Sstevel@tonic-gatethen it will be destroyed and its memory made available for reuse.
696*0Sstevel@tonic-gate
697*0Sstevel@tonic-gateThis normally doesn't happen at the Perl level unless a variable is
698*0Sstevel@tonic-gateundef'ed or the last variable holding a reference to it is changed or
699*0Sstevel@tonic-gateoverwritten.  At the internal level, however, reference counts can be
700*0Sstevel@tonic-gatemanipulated with the following macros:
701*0Sstevel@tonic-gate
702*0Sstevel@tonic-gate    int SvREFCNT(SV* sv);
703*0Sstevel@tonic-gate    SV* SvREFCNT_inc(SV* sv);
704*0Sstevel@tonic-gate    void SvREFCNT_dec(SV* sv);
705*0Sstevel@tonic-gate
706*0Sstevel@tonic-gateHowever, there is one other function which manipulates the reference
707*0Sstevel@tonic-gatecount of its argument.  The C<newRV_inc> function, you will recall,
708*0Sstevel@tonic-gatecreates a reference to the specified argument.  As a side effect,
709*0Sstevel@tonic-gateit increments the argument's reference count.  If this is not what
710*0Sstevel@tonic-gateyou want, use C<newRV_noinc> instead.
711*0Sstevel@tonic-gate
712*0Sstevel@tonic-gateFor example, imagine you want to return a reference from an XSUB function.
713*0Sstevel@tonic-gateInside the XSUB routine, you create an SV which initially has a reference
714*0Sstevel@tonic-gatecount of one.  Then you call C<newRV_inc>, passing it the just-created SV.
715*0Sstevel@tonic-gateThis returns the reference as a new SV, but the reference count of the
716*0Sstevel@tonic-gateSV you passed to C<newRV_inc> has been incremented to two.  Now you
717*0Sstevel@tonic-gatereturn the reference from the XSUB routine and forget about the SV.
718*0Sstevel@tonic-gateBut Perl hasn't!  Whenever the returned reference is destroyed, the
719*0Sstevel@tonic-gatereference count of the original SV is decreased to one and nothing happens.
720*0Sstevel@tonic-gateThe SV will hang around without any way to access it until Perl itself
721*0Sstevel@tonic-gateterminates.  This is a memory leak.
722*0Sstevel@tonic-gate
723*0Sstevel@tonic-gateThe correct procedure, then, is to use C<newRV_noinc> instead of
724*0Sstevel@tonic-gateC<newRV_inc>.  Then, if and when the last reference is destroyed,
725*0Sstevel@tonic-gatethe reference count of the SV will go to zero and it will be destroyed,
726*0Sstevel@tonic-gatestopping any memory leak.
727*0Sstevel@tonic-gate
728*0Sstevel@tonic-gateThere are some convenience functions available that can help with the
729*0Sstevel@tonic-gatedestruction of xVs.  These functions introduce the concept of "mortality".
730*0Sstevel@tonic-gateAn xV that is mortal has had its reference count marked to be decremented,
731*0Sstevel@tonic-gatebut not actually decremented, until "a short time later".  Generally the
732*0Sstevel@tonic-gateterm "short time later" means a single Perl statement, such as a call to
733*0Sstevel@tonic-gatean XSUB function.  The actual determinant for when mortal xVs have their
734*0Sstevel@tonic-gatereference count decremented depends on two macros, SAVETMPS and FREETMPS.
735*0Sstevel@tonic-gateSee L<perlcall> and L<perlxs> for more details on these macros.
736*0Sstevel@tonic-gate
737*0Sstevel@tonic-gate"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
738*0Sstevel@tonic-gateHowever, if you mortalize a variable twice, the reference count will
739*0Sstevel@tonic-gatelater be decremented twice.
740*0Sstevel@tonic-gate
741*0Sstevel@tonic-gate"Mortal" SVs are mainly used for SVs that are placed on perl's stack.
742*0Sstevel@tonic-gateFor example an SV which is created just to pass a number to a called sub
743*0Sstevel@tonic-gateis made mortal to have it cleaned up automatically when it's popped off
744*0Sstevel@tonic-gatethe stack. Similarly, results returned by XSUBs (which are pushed on the
745*0Sstevel@tonic-gatestack) are often made mortal.
746*0Sstevel@tonic-gate
747*0Sstevel@tonic-gateTo create a mortal variable, use the functions:
748*0Sstevel@tonic-gate
749*0Sstevel@tonic-gate    SV*  sv_newmortal()
750*0Sstevel@tonic-gate    SV*  sv_2mortal(SV*)
751*0Sstevel@tonic-gate    SV*  sv_mortalcopy(SV*)
752*0Sstevel@tonic-gate
753*0Sstevel@tonic-gateThe first call creates a mortal SV (with no value), the second converts an existing
754*0Sstevel@tonic-gateSV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
755*0Sstevel@tonic-gatethird creates a mortal copy of an existing SV.
756*0Sstevel@tonic-gateBecause C<sv_newmortal> gives the new SV no value,it must normally be given one
757*0Sstevel@tonic-gatevia C<sv_setpv>, C<sv_setiv>, etc. :
758*0Sstevel@tonic-gate
759*0Sstevel@tonic-gate    SV *tmp = sv_newmortal();
760*0Sstevel@tonic-gate    sv_setiv(tmp, an_integer);
761*0Sstevel@tonic-gate
762*0Sstevel@tonic-gateAs that is multiple C statements it is quite common so see this idiom instead:
763*0Sstevel@tonic-gate
764*0Sstevel@tonic-gate    SV *tmp = sv_2mortal(newSViv(an_integer));
765*0Sstevel@tonic-gate
766*0Sstevel@tonic-gate
767*0Sstevel@tonic-gateYou should be careful about creating mortal variables.  Strange things
768*0Sstevel@tonic-gatecan happen if you make the same value mortal within multiple contexts,
769*0Sstevel@tonic-gateor if you make a variable mortal multiple times. Thinking of "Mortalization"
770*0Sstevel@tonic-gateas deferred C<SvREFCNT_dec> should help to minimize such problems.
771*0Sstevel@tonic-gateFor example if you are passing an SV which you I<know> has high enough REFCNT
772*0Sstevel@tonic-gateto survive its use on the stack you need not do any mortalization.
773*0Sstevel@tonic-gateIf you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
774*0Sstevel@tonic-gatemaking a C<sv_mortalcopy> is safer.
775*0Sstevel@tonic-gate
776*0Sstevel@tonic-gateThe mortal routines are not just for SVs -- AVs and HVs can be
777*0Sstevel@tonic-gatemade mortal by passing their address (type-casted to C<SV*>) to the
778*0Sstevel@tonic-gateC<sv_2mortal> or C<sv_mortalcopy> routines.
779*0Sstevel@tonic-gate
780*0Sstevel@tonic-gate=head2 Stashes and Globs
781*0Sstevel@tonic-gate
782*0Sstevel@tonic-gateA B<stash> is a hash that contains all variables that are defined
783*0Sstevel@tonic-gatewithin a package.  Each key of the stash is a symbol
784*0Sstevel@tonic-gatename (shared by all the different types of objects that have the same
785*0Sstevel@tonic-gatename), and each value in the hash table is a GV (Glob Value).  This GV
786*0Sstevel@tonic-gatein turn contains references to the various objects of that name,
787*0Sstevel@tonic-gateincluding (but not limited to) the following:
788*0Sstevel@tonic-gate
789*0Sstevel@tonic-gate    Scalar Value
790*0Sstevel@tonic-gate    Array Value
791*0Sstevel@tonic-gate    Hash Value
792*0Sstevel@tonic-gate    I/O Handle
793*0Sstevel@tonic-gate    Format
794*0Sstevel@tonic-gate    Subroutine
795*0Sstevel@tonic-gate
796*0Sstevel@tonic-gateThere is a single stash called C<PL_defstash> that holds the items that exist
797*0Sstevel@tonic-gatein the C<main> package.  To get at the items in other packages, append the
798*0Sstevel@tonic-gatestring "::" to the package name.  The items in the C<Foo> package are in
799*0Sstevel@tonic-gatethe stash C<Foo::> in PL_defstash.  The items in the C<Bar::Baz> package are
800*0Sstevel@tonic-gatein the stash C<Baz::> in C<Bar::>'s stash.
801*0Sstevel@tonic-gate
802*0Sstevel@tonic-gateTo get the stash pointer for a particular package, use the function:
803*0Sstevel@tonic-gate
804*0Sstevel@tonic-gate    HV*  gv_stashpv(const char* name, I32 create)
805*0Sstevel@tonic-gate    HV*  gv_stashsv(SV*, I32 create)
806*0Sstevel@tonic-gate
807*0Sstevel@tonic-gateThe first function takes a literal string, the second uses the string stored
808*0Sstevel@tonic-gatein the SV.  Remember that a stash is just a hash table, so you get back an
809*0Sstevel@tonic-gateC<HV*>.  The C<create> flag will create a new package if it is set.
810*0Sstevel@tonic-gate
811*0Sstevel@tonic-gateThe name that C<gv_stash*v> wants is the name of the package whose symbol table
812*0Sstevel@tonic-gateyou want.  The default package is called C<main>.  If you have multiply nested
813*0Sstevel@tonic-gatepackages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
814*0Sstevel@tonic-gatelanguage itself.
815*0Sstevel@tonic-gate
816*0Sstevel@tonic-gateAlternately, if you have an SV that is a blessed reference, you can find
817*0Sstevel@tonic-gateout the stash pointer by using:
818*0Sstevel@tonic-gate
819*0Sstevel@tonic-gate    HV*  SvSTASH(SvRV(SV*));
820*0Sstevel@tonic-gate
821*0Sstevel@tonic-gatethen use the following to get the package name itself:
822*0Sstevel@tonic-gate
823*0Sstevel@tonic-gate    char*  HvNAME(HV* stash);
824*0Sstevel@tonic-gate
825*0Sstevel@tonic-gateIf you need to bless or re-bless an object you can use the following
826*0Sstevel@tonic-gatefunction:
827*0Sstevel@tonic-gate
828*0Sstevel@tonic-gate    SV*  sv_bless(SV*, HV* stash)
829*0Sstevel@tonic-gate
830*0Sstevel@tonic-gatewhere the first argument, an C<SV*>, must be a reference, and the second
831*0Sstevel@tonic-gateargument is a stash.  The returned C<SV*> can now be used in the same way
832*0Sstevel@tonic-gateas any other SV.
833*0Sstevel@tonic-gate
834*0Sstevel@tonic-gateFor more information on references and blessings, consult L<perlref>.
835*0Sstevel@tonic-gate
836*0Sstevel@tonic-gate=head2 Double-Typed SVs
837*0Sstevel@tonic-gate
838*0Sstevel@tonic-gateScalar variables normally contain only one type of value, an integer,
839*0Sstevel@tonic-gatedouble, pointer, or reference.  Perl will automatically convert the
840*0Sstevel@tonic-gateactual scalar data from the stored type into the requested type.
841*0Sstevel@tonic-gate
842*0Sstevel@tonic-gateSome scalar variables contain more than one type of scalar data.  For
843*0Sstevel@tonic-gateexample, the variable C<$!> contains either the numeric value of C<errno>
844*0Sstevel@tonic-gateor its string equivalent from either C<strerror> or C<sys_errlist[]>.
845*0Sstevel@tonic-gate
846*0Sstevel@tonic-gateTo force multiple data values into an SV, you must do two things: use the
847*0Sstevel@tonic-gateC<sv_set*v> routines to add the additional scalar type, then set a flag
848*0Sstevel@tonic-gateso that Perl will believe it contains more than one type of data.  The
849*0Sstevel@tonic-gatefour macros to set the flags are:
850*0Sstevel@tonic-gate
851*0Sstevel@tonic-gate	SvIOK_on
852*0Sstevel@tonic-gate	SvNOK_on
853*0Sstevel@tonic-gate	SvPOK_on
854*0Sstevel@tonic-gate	SvROK_on
855*0Sstevel@tonic-gate
856*0Sstevel@tonic-gateThe particular macro you must use depends on which C<sv_set*v> routine
857*0Sstevel@tonic-gateyou called first.  This is because every C<sv_set*v> routine turns on
858*0Sstevel@tonic-gateonly the bit for the particular type of data being set, and turns off
859*0Sstevel@tonic-gateall the rest.
860*0Sstevel@tonic-gate
861*0Sstevel@tonic-gateFor example, to create a new Perl variable called "dberror" that contains
862*0Sstevel@tonic-gateboth the numeric and descriptive string error values, you could use the
863*0Sstevel@tonic-gatefollowing code:
864*0Sstevel@tonic-gate
865*0Sstevel@tonic-gate    extern int  dberror;
866*0Sstevel@tonic-gate    extern char *dberror_list;
867*0Sstevel@tonic-gate
868*0Sstevel@tonic-gate    SV* sv = get_sv("dberror", TRUE);
869*0Sstevel@tonic-gate    sv_setiv(sv, (IV) dberror);
870*0Sstevel@tonic-gate    sv_setpv(sv, dberror_list[dberror]);
871*0Sstevel@tonic-gate    SvIOK_on(sv);
872*0Sstevel@tonic-gate
873*0Sstevel@tonic-gateIf the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
874*0Sstevel@tonic-gatemacro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
875*0Sstevel@tonic-gate
876*0Sstevel@tonic-gate=head2 Magic Variables
877*0Sstevel@tonic-gate
878*0Sstevel@tonic-gate[This section still under construction.  Ignore everything here.  Post no
879*0Sstevel@tonic-gatebills.  Everything not permitted is forbidden.]
880*0Sstevel@tonic-gate
881*0Sstevel@tonic-gateAny SV may be magical, that is, it has special features that a normal
882*0Sstevel@tonic-gateSV does not have.  These features are stored in the SV structure in a
883*0Sstevel@tonic-gatelinked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
884*0Sstevel@tonic-gate
885*0Sstevel@tonic-gate    struct magic {
886*0Sstevel@tonic-gate        MAGIC*      mg_moremagic;
887*0Sstevel@tonic-gate        MGVTBL*     mg_virtual;
888*0Sstevel@tonic-gate        U16         mg_private;
889*0Sstevel@tonic-gate        char        mg_type;
890*0Sstevel@tonic-gate        U8          mg_flags;
891*0Sstevel@tonic-gate        SV*         mg_obj;
892*0Sstevel@tonic-gate        char*       mg_ptr;
893*0Sstevel@tonic-gate        I32         mg_len;
894*0Sstevel@tonic-gate    };
895*0Sstevel@tonic-gate
896*0Sstevel@tonic-gateNote this is current as of patchlevel 0, and could change at any time.
897*0Sstevel@tonic-gate
898*0Sstevel@tonic-gate=head2 Assigning Magic
899*0Sstevel@tonic-gate
900*0Sstevel@tonic-gatePerl adds magic to an SV using the sv_magic function:
901*0Sstevel@tonic-gate
902*0Sstevel@tonic-gate    void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
903*0Sstevel@tonic-gate
904*0Sstevel@tonic-gateThe C<sv> argument is a pointer to the SV that is to acquire a new magical
905*0Sstevel@tonic-gatefeature.
906*0Sstevel@tonic-gate
907*0Sstevel@tonic-gateIf C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
908*0Sstevel@tonic-gateconvert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic
909*0Sstevel@tonic-gateto the beginning of the linked list of magical features.  Any prior entry
910*0Sstevel@tonic-gateof the same type of magic is deleted.  Note that this can be overridden,
911*0Sstevel@tonic-gateand multiple instances of the same type of magic can be associated with an
912*0Sstevel@tonic-gateSV.
913*0Sstevel@tonic-gate
914*0Sstevel@tonic-gateThe C<name> and C<namlen> arguments are used to associate a string with
915*0Sstevel@tonic-gatethe magic, typically the name of a variable. C<namlen> is stored in the
916*0Sstevel@tonic-gateC<mg_len> field and if C<name> is non-null and C<namlen> E<gt>= 0 a malloc'd
917*0Sstevel@tonic-gatecopy of the name is stored in C<mg_ptr> field.
918*0Sstevel@tonic-gate
919*0Sstevel@tonic-gateThe sv_magic function uses C<how> to determine which, if any, predefined
920*0Sstevel@tonic-gate"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
921*0Sstevel@tonic-gateSee the L<Magic Virtual Tables> section below.  The C<how> argument is also
922*0Sstevel@tonic-gatestored in the C<mg_type> field. The value of C<how> should be chosen
923*0Sstevel@tonic-gatefrom the set of macros C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
924*0Sstevel@tonic-gatethese macros were added, Perl internals used to directly use character
925*0Sstevel@tonic-gateliterals, so you may occasionally come across old code or documentation
926*0Sstevel@tonic-gatereferring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
927*0Sstevel@tonic-gate
928*0Sstevel@tonic-gateThe C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
929*0Sstevel@tonic-gatestructure.  If it is not the same as the C<sv> argument, the reference
930*0Sstevel@tonic-gatecount of the C<obj> object is incremented.  If it is the same, or if
931*0Sstevel@tonic-gatethe C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer,
932*0Sstevel@tonic-gatethen C<obj> is merely stored, without the reference count being incremented.
933*0Sstevel@tonic-gate
934*0Sstevel@tonic-gateThere is also a function to add magic to an C<HV>:
935*0Sstevel@tonic-gate
936*0Sstevel@tonic-gate    void hv_magic(HV *hv, GV *gv, int how);
937*0Sstevel@tonic-gate
938*0Sstevel@tonic-gateThis simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
939*0Sstevel@tonic-gate
940*0Sstevel@tonic-gateTo remove the magic from an SV, call the function sv_unmagic:
941*0Sstevel@tonic-gate
942*0Sstevel@tonic-gate    void sv_unmagic(SV *sv, int type);
943*0Sstevel@tonic-gate
944*0Sstevel@tonic-gateThe C<type> argument should be equal to the C<how> value when the C<SV>
945*0Sstevel@tonic-gatewas initially made magical.
946*0Sstevel@tonic-gate
947*0Sstevel@tonic-gate=head2 Magic Virtual Tables
948*0Sstevel@tonic-gate
949*0Sstevel@tonic-gateThe C<mg_virtual> field in the C<MAGIC> structure is a pointer to an
950*0Sstevel@tonic-gateC<MGVTBL>, which is a structure of function pointers and stands for
951*0Sstevel@tonic-gate"Magic Virtual Table" to handle the various operations that might be
952*0Sstevel@tonic-gateapplied to that variable.
953*0Sstevel@tonic-gate
954*0Sstevel@tonic-gateThe C<MGVTBL> has five pointers to the following routine types:
955*0Sstevel@tonic-gate
956*0Sstevel@tonic-gate    int  (*svt_get)(SV* sv, MAGIC* mg);
957*0Sstevel@tonic-gate    int  (*svt_set)(SV* sv, MAGIC* mg);
958*0Sstevel@tonic-gate    U32  (*svt_len)(SV* sv, MAGIC* mg);
959*0Sstevel@tonic-gate    int  (*svt_clear)(SV* sv, MAGIC* mg);
960*0Sstevel@tonic-gate    int  (*svt_free)(SV* sv, MAGIC* mg);
961*0Sstevel@tonic-gate
962*0Sstevel@tonic-gateThis MGVTBL structure is set at compile-time in F<perl.h> and there are
963*0Sstevel@tonic-gatecurrently 19 types (or 21 with overloading turned on).  These different
964*0Sstevel@tonic-gatestructures contain pointers to various routines that perform additional
965*0Sstevel@tonic-gateactions depending on which function is being called.
966*0Sstevel@tonic-gate
967*0Sstevel@tonic-gate    Function pointer    Action taken
968*0Sstevel@tonic-gate    ----------------    ------------
969*0Sstevel@tonic-gate    svt_get             Do something before the value of the SV is retrieved.
970*0Sstevel@tonic-gate    svt_set             Do something after the SV is assigned a value.
971*0Sstevel@tonic-gate    svt_len             Report on the SV's length.
972*0Sstevel@tonic-gate    svt_clear		Clear something the SV represents.
973*0Sstevel@tonic-gate    svt_free            Free any extra storage associated with the SV.
974*0Sstevel@tonic-gate
975*0Sstevel@tonic-gateFor instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
976*0Sstevel@tonic-gateto an C<mg_type> of C<PERL_MAGIC_sv>) contains:
977*0Sstevel@tonic-gate
978*0Sstevel@tonic-gate    { magic_get, magic_set, magic_len, 0, 0 }
979*0Sstevel@tonic-gate
980*0Sstevel@tonic-gateThus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>,
981*0Sstevel@tonic-gateif a get operation is being performed, the routine C<magic_get> is
982*0Sstevel@tonic-gatecalled.  All the various routines for the various magical types begin
983*0Sstevel@tonic-gatewith C<magic_>.  NOTE: the magic routines are not considered part of
984*0Sstevel@tonic-gatethe Perl API, and may not be exported by the Perl library.
985*0Sstevel@tonic-gate
986*0Sstevel@tonic-gateThe current kinds of Magic Virtual Tables are:
987*0Sstevel@tonic-gate
988*0Sstevel@tonic-gate    mg_type
989*0Sstevel@tonic-gate    (old-style char and macro)   MGVTBL         Type of magic
990*0Sstevel@tonic-gate    --------------------------   ------         ----------------------------
991*0Sstevel@tonic-gate    \0 PERL_MAGIC_sv             vtbl_sv        Special scalar variable
992*0Sstevel@tonic-gate    A  PERL_MAGIC_overload       vtbl_amagic    %OVERLOAD hash
993*0Sstevel@tonic-gate    a  PERL_MAGIC_overload_elem  vtbl_amagicelem %OVERLOAD hash element
994*0Sstevel@tonic-gate    c  PERL_MAGIC_overload_table (none)         Holds overload table (AMT)
995*0Sstevel@tonic-gate						on stash
996*0Sstevel@tonic-gate    B  PERL_MAGIC_bm             vtbl_bm        Boyer-Moore (fast string search)
997*0Sstevel@tonic-gate    D  PERL_MAGIC_regdata        vtbl_regdata   Regex match position data
998*0Sstevel@tonic-gate						(@+ and @- vars)
999*0Sstevel@tonic-gate    d  PERL_MAGIC_regdatum       vtbl_regdatum  Regex match position data
1000*0Sstevel@tonic-gate						element
1001*0Sstevel@tonic-gate    E  PERL_MAGIC_env            vtbl_env       %ENV hash
1002*0Sstevel@tonic-gate    e  PERL_MAGIC_envelem        vtbl_envelem   %ENV hash element
1003*0Sstevel@tonic-gate    f  PERL_MAGIC_fm             vtbl_fm        Formline ('compiled' format)
1004*0Sstevel@tonic-gate    g  PERL_MAGIC_regex_global   vtbl_mglob     m//g target / study()ed string
1005*0Sstevel@tonic-gate    I  PERL_MAGIC_isa            vtbl_isa       @ISA array
1006*0Sstevel@tonic-gate    i  PERL_MAGIC_isaelem        vtbl_isaelem   @ISA array element
1007*0Sstevel@tonic-gate    k  PERL_MAGIC_nkeys          vtbl_nkeys     scalar(keys()) lvalue
1008*0Sstevel@tonic-gate    L  PERL_MAGIC_dbfile         (none)         Debugger %_<filename
1009*0Sstevel@tonic-gate    l  PERL_MAGIC_dbline         vtbl_dbline    Debugger %_<filename element
1010*0Sstevel@tonic-gate    m  PERL_MAGIC_mutex          vtbl_mutex     ???
1011*0Sstevel@tonic-gate    o  PERL_MAGIC_collxfrm       vtbl_collxfrm  Locale collate transformation
1012*0Sstevel@tonic-gate    P  PERL_MAGIC_tied           vtbl_pack      Tied array or hash
1013*0Sstevel@tonic-gate    p  PERL_MAGIC_tiedelem       vtbl_packelem  Tied array or hash element
1014*0Sstevel@tonic-gate    q  PERL_MAGIC_tiedscalar     vtbl_packelem  Tied scalar or handle
1015*0Sstevel@tonic-gate    r  PERL_MAGIC_qr             vtbl_qr        precompiled qr// regex
1016*0Sstevel@tonic-gate    S  PERL_MAGIC_sig            vtbl_sig       %SIG hash
1017*0Sstevel@tonic-gate    s  PERL_MAGIC_sigelem        vtbl_sigelem   %SIG hash element
1018*0Sstevel@tonic-gate    t  PERL_MAGIC_taint          vtbl_taint     Taintedness
1019*0Sstevel@tonic-gate    U  PERL_MAGIC_uvar           vtbl_uvar      Available for use by extensions
1020*0Sstevel@tonic-gate    v  PERL_MAGIC_vec            vtbl_vec       vec() lvalue
1021*0Sstevel@tonic-gate    V  PERL_MAGIC_vstring        (none)         v-string scalars
1022*0Sstevel@tonic-gate    w  PERL_MAGIC_utf8           vtbl_utf8      UTF-8 length+offset cache
1023*0Sstevel@tonic-gate    x  PERL_MAGIC_substr         vtbl_substr    substr() lvalue
1024*0Sstevel@tonic-gate    y  PERL_MAGIC_defelem        vtbl_defelem   Shadow "foreach" iterator
1025*0Sstevel@tonic-gate						variable / smart parameter
1026*0Sstevel@tonic-gate						vivification
1027*0Sstevel@tonic-gate    *  PERL_MAGIC_glob           vtbl_glob      GV (typeglob)
1028*0Sstevel@tonic-gate    #  PERL_MAGIC_arylen         vtbl_arylen    Array length ($#ary)
1029*0Sstevel@tonic-gate    .  PERL_MAGIC_pos            vtbl_pos       pos() lvalue
1030*0Sstevel@tonic-gate    <  PERL_MAGIC_backref        vtbl_backref   ???
1031*0Sstevel@tonic-gate    ~  PERL_MAGIC_ext            (none)         Available for use by extensions
1032*0Sstevel@tonic-gate
1033*0Sstevel@tonic-gateWhen an uppercase and lowercase letter both exist in the table, then the
1034*0Sstevel@tonic-gateuppercase letter is typically used to represent some kind of composite type
1035*0Sstevel@tonic-gate(a list or a hash), and the lowercase letter is used to represent an element
1036*0Sstevel@tonic-gateof that composite type. Some internals code makes use of this case
1037*0Sstevel@tonic-gaterelationship.  However, 'v' and 'V' (vec and v-string) are in no way related.
1038*0Sstevel@tonic-gate
1039*0Sstevel@tonic-gateThe C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined
1040*0Sstevel@tonic-gatespecifically for use by extensions and will not be used by perl itself.
1041*0Sstevel@tonic-gateExtensions can use C<PERL_MAGIC_ext> magic to 'attach' private information
1042*0Sstevel@tonic-gateto variables (typically objects).  This is especially useful because
1043*0Sstevel@tonic-gatethere is no way for normal perl code to corrupt this private information
1044*0Sstevel@tonic-gate(unlike using extra elements of a hash object).
1045*0Sstevel@tonic-gate
1046*0Sstevel@tonic-gateSimilarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a
1047*0Sstevel@tonic-gateC function any time a scalar's value is used or changed.  The C<MAGIC>'s
1048*0Sstevel@tonic-gateC<mg_ptr> field points to a C<ufuncs> structure:
1049*0Sstevel@tonic-gate
1050*0Sstevel@tonic-gate    struct ufuncs {
1051*0Sstevel@tonic-gate        I32 (*uf_val)(pTHX_ IV, SV*);
1052*0Sstevel@tonic-gate        I32 (*uf_set)(pTHX_ IV, SV*);
1053*0Sstevel@tonic-gate        IV uf_index;
1054*0Sstevel@tonic-gate    };
1055*0Sstevel@tonic-gate
1056*0Sstevel@tonic-gateWhen the SV is read from or written to, the C<uf_val> or C<uf_set>
1057*0Sstevel@tonic-gatefunction will be called with C<uf_index> as the first arg and a pointer to
1058*0Sstevel@tonic-gatethe SV as the second.  A simple example of how to add C<PERL_MAGIC_uvar>
1059*0Sstevel@tonic-gatemagic is shown below.  Note that the ufuncs structure is copied by
1060*0Sstevel@tonic-gatesv_magic, so you can safely allocate it on the stack.
1061*0Sstevel@tonic-gate
1062*0Sstevel@tonic-gate    void
1063*0Sstevel@tonic-gate    Umagic(sv)
1064*0Sstevel@tonic-gate        SV *sv;
1065*0Sstevel@tonic-gate    PREINIT:
1066*0Sstevel@tonic-gate        struct ufuncs uf;
1067*0Sstevel@tonic-gate    CODE:
1068*0Sstevel@tonic-gate        uf.uf_val   = &my_get_fn;
1069*0Sstevel@tonic-gate        uf.uf_set   = &my_set_fn;
1070*0Sstevel@tonic-gate        uf.uf_index = 0;
1071*0Sstevel@tonic-gate        sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1072*0Sstevel@tonic-gate
1073*0Sstevel@tonic-gateNote that because multiple extensions may be using C<PERL_MAGIC_ext>
1074*0Sstevel@tonic-gateor C<PERL_MAGIC_uvar> magic, it is important for extensions to take
1075*0Sstevel@tonic-gateextra care to avoid conflict.  Typically only using the magic on
1076*0Sstevel@tonic-gateobjects blessed into the same class as the extension is sufficient.
1077*0Sstevel@tonic-gateFor C<PERL_MAGIC_ext> magic, it may also be appropriate to add an I32
1078*0Sstevel@tonic-gate'signature' at the top of the private data area and check that.
1079*0Sstevel@tonic-gate
1080*0Sstevel@tonic-gateAlso note that the C<sv_set*()> and C<sv_cat*()> functions described
1081*0Sstevel@tonic-gateearlier do B<not> invoke 'set' magic on their targets.  This must
1082*0Sstevel@tonic-gatebe done by the user either by calling the C<SvSETMAGIC()> macro after
1083*0Sstevel@tonic-gatecalling these functions, or by using one of the C<sv_set*_mg()> or
1084*0Sstevel@tonic-gateC<sv_cat*_mg()> functions.  Similarly, generic C code must call the
1085*0Sstevel@tonic-gateC<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
1086*0Sstevel@tonic-gateobtained from external sources in functions that don't handle magic.
1087*0Sstevel@tonic-gateSee L<perlapi> for a description of these functions.
1088*0Sstevel@tonic-gateFor example, calls to the C<sv_cat*()> functions typically need to be
1089*0Sstevel@tonic-gatefollowed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
1090*0Sstevel@tonic-gatesince their implementation handles 'get' magic.
1091*0Sstevel@tonic-gate
1092*0Sstevel@tonic-gate=head2 Finding Magic
1093*0Sstevel@tonic-gate
1094*0Sstevel@tonic-gate    MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
1095*0Sstevel@tonic-gate
1096*0Sstevel@tonic-gateThis routine returns a pointer to the C<MAGIC> structure stored in the SV.
1097*0Sstevel@tonic-gateIf the SV does not have that magical feature, C<NULL> is returned.  Also,
1098*0Sstevel@tonic-gateif the SV is not of type SVt_PVMG, Perl may core dump.
1099*0Sstevel@tonic-gate
1100*0Sstevel@tonic-gate    int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1101*0Sstevel@tonic-gate
1102*0Sstevel@tonic-gateThis routine checks to see what types of magic C<sv> has.  If the mg_type
1103*0Sstevel@tonic-gatefield is an uppercase letter, then the mg_obj is copied to C<nsv>, but
1104*0Sstevel@tonic-gatethe mg_type field is changed to be the lowercase letter.
1105*0Sstevel@tonic-gate
1106*0Sstevel@tonic-gate=head2 Understanding the Magic of Tied Hashes and Arrays
1107*0Sstevel@tonic-gate
1108*0Sstevel@tonic-gateTied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied>
1109*0Sstevel@tonic-gatemagic type.
1110*0Sstevel@tonic-gate
1111*0Sstevel@tonic-gateWARNING: As of the 5.004 release, proper usage of the array and hash
1112*0Sstevel@tonic-gateaccess functions requires understanding a few caveats.  Some
1113*0Sstevel@tonic-gateof these caveats are actually considered bugs in the API, to be fixed
1114*0Sstevel@tonic-gatein later releases, and are bracketed with [MAYCHANGE] below. If
1115*0Sstevel@tonic-gateyou find yourself actually applying such information in this section, be
1116*0Sstevel@tonic-gateaware that the behavior may change in the future, umm, without warning.
1117*0Sstevel@tonic-gate
1118*0Sstevel@tonic-gateThe perl tie function associates a variable with an object that implements
1119*0Sstevel@tonic-gatethe various GET, SET, etc methods.  To perform the equivalent of the perl
1120*0Sstevel@tonic-gatetie function from an XSUB, you must mimic this behaviour.  The code below
1121*0Sstevel@tonic-gatecarries out the necessary steps - firstly it creates a new hash, and then
1122*0Sstevel@tonic-gatecreates a second hash which it blesses into the class which will implement
1123*0Sstevel@tonic-gatethe tie methods. Lastly it ties the two hashes together, and returns a
1124*0Sstevel@tonic-gatereference to the new tied hash.  Note that the code below does NOT call the
1125*0Sstevel@tonic-gateTIEHASH method in the MyTie class -
1126*0Sstevel@tonic-gatesee L<Calling Perl Routines from within C Programs> for details on how
1127*0Sstevel@tonic-gateto do this.
1128*0Sstevel@tonic-gate
1129*0Sstevel@tonic-gate    SV*
1130*0Sstevel@tonic-gate    mytie()
1131*0Sstevel@tonic-gate    PREINIT:
1132*0Sstevel@tonic-gate        HV *hash;
1133*0Sstevel@tonic-gate        HV *stash;
1134*0Sstevel@tonic-gate        SV *tie;
1135*0Sstevel@tonic-gate    CODE:
1136*0Sstevel@tonic-gate        hash = newHV();
1137*0Sstevel@tonic-gate        tie = newRV_noinc((SV*)newHV());
1138*0Sstevel@tonic-gate        stash = gv_stashpv("MyTie", TRUE);
1139*0Sstevel@tonic-gate        sv_bless(tie, stash);
1140*0Sstevel@tonic-gate        hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1141*0Sstevel@tonic-gate        RETVAL = newRV_noinc(hash);
1142*0Sstevel@tonic-gate    OUTPUT:
1143*0Sstevel@tonic-gate        RETVAL
1144*0Sstevel@tonic-gate
1145*0Sstevel@tonic-gateThe C<av_store> function, when given a tied array argument, merely
1146*0Sstevel@tonic-gatecopies the magic of the array onto the value to be "stored", using
1147*0Sstevel@tonic-gateC<mg_copy>.  It may also return NULL, indicating that the value did not
1148*0Sstevel@tonic-gateactually need to be stored in the array.  [MAYCHANGE] After a call to
1149*0Sstevel@tonic-gateC<av_store> on a tied array, the caller will usually need to call
1150*0Sstevel@tonic-gateC<mg_set(val)> to actually invoke the perl level "STORE" method on the
1151*0Sstevel@tonic-gateTIEARRAY object.  If C<av_store> did return NULL, a call to
1152*0Sstevel@tonic-gateC<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
1153*0Sstevel@tonic-gateleak. [/MAYCHANGE]
1154*0Sstevel@tonic-gate
1155*0Sstevel@tonic-gateThe previous paragraph is applicable verbatim to tied hash access using the
1156*0Sstevel@tonic-gateC<hv_store> and C<hv_store_ent> functions as well.
1157*0Sstevel@tonic-gate
1158*0Sstevel@tonic-gateC<av_fetch> and the corresponding hash functions C<hv_fetch> and
1159*0Sstevel@tonic-gateC<hv_fetch_ent> actually return an undefined mortal value whose magic
1160*0Sstevel@tonic-gatehas been initialized using C<mg_copy>.  Note the value so returned does not
1161*0Sstevel@tonic-gateneed to be deallocated, as it is already mortal.  [MAYCHANGE] But you will
1162*0Sstevel@tonic-gateneed to call C<mg_get()> on the returned value in order to actually invoke
1163*0Sstevel@tonic-gatethe perl level "FETCH" method on the underlying TIE object.  Similarly,
1164*0Sstevel@tonic-gateyou may also call C<mg_set()> on the return value after possibly assigning
1165*0Sstevel@tonic-gatea suitable value to it using C<sv_setsv>,  which will invoke the "STORE"
1166*0Sstevel@tonic-gatemethod on the TIE object. [/MAYCHANGE]
1167*0Sstevel@tonic-gate
1168*0Sstevel@tonic-gate[MAYCHANGE]
1169*0Sstevel@tonic-gateIn other words, the array or hash fetch/store functions don't really
1170*0Sstevel@tonic-gatefetch and store actual values in the case of tied arrays and hashes.  They
1171*0Sstevel@tonic-gatemerely call C<mg_copy> to attach magic to the values that were meant to be
1172*0Sstevel@tonic-gate"stored" or "fetched".  Later calls to C<mg_get> and C<mg_set> actually
1173*0Sstevel@tonic-gatedo the job of invoking the TIE methods on the underlying objects.  Thus
1174*0Sstevel@tonic-gatethe magic mechanism currently implements a kind of lazy access to arrays
1175*0Sstevel@tonic-gateand hashes.
1176*0Sstevel@tonic-gate
1177*0Sstevel@tonic-gateCurrently (as of perl version 5.004), use of the hash and array access
1178*0Sstevel@tonic-gatefunctions requires the user to be aware of whether they are operating on
1179*0Sstevel@tonic-gate"normal" hashes and arrays, or on their tied variants.  The API may be
1180*0Sstevel@tonic-gatechanged to provide more transparent access to both tied and normal data
1181*0Sstevel@tonic-gatetypes in future versions.
1182*0Sstevel@tonic-gate[/MAYCHANGE]
1183*0Sstevel@tonic-gate
1184*0Sstevel@tonic-gateYou would do well to understand that the TIEARRAY and TIEHASH interfaces
1185*0Sstevel@tonic-gateare mere sugar to invoke some perl method calls while using the uniform hash
1186*0Sstevel@tonic-gateand array syntax.  The use of this sugar imposes some overhead (typically
1187*0Sstevel@tonic-gateabout two to four extra opcodes per FETCH/STORE operation, in addition to
1188*0Sstevel@tonic-gatethe creation of all the mortal variables required to invoke the methods).
1189*0Sstevel@tonic-gateThis overhead will be comparatively small if the TIE methods are themselves
1190*0Sstevel@tonic-gatesubstantial, but if they are only a few statements long, the overhead
1191*0Sstevel@tonic-gatewill not be insignificant.
1192*0Sstevel@tonic-gate
1193*0Sstevel@tonic-gate=head2 Localizing changes
1194*0Sstevel@tonic-gate
1195*0Sstevel@tonic-gatePerl has a very handy construction
1196*0Sstevel@tonic-gate
1197*0Sstevel@tonic-gate  {
1198*0Sstevel@tonic-gate    local $var = 2;
1199*0Sstevel@tonic-gate    ...
1200*0Sstevel@tonic-gate  }
1201*0Sstevel@tonic-gate
1202*0Sstevel@tonic-gateThis construction is I<approximately> equivalent to
1203*0Sstevel@tonic-gate
1204*0Sstevel@tonic-gate  {
1205*0Sstevel@tonic-gate    my $oldvar = $var;
1206*0Sstevel@tonic-gate    $var = 2;
1207*0Sstevel@tonic-gate    ...
1208*0Sstevel@tonic-gate    $var = $oldvar;
1209*0Sstevel@tonic-gate  }
1210*0Sstevel@tonic-gate
1211*0Sstevel@tonic-gateThe biggest difference is that the first construction would
1212*0Sstevel@tonic-gatereinstate the initial value of $var, irrespective of how control exits
1213*0Sstevel@tonic-gatethe block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
1214*0Sstevel@tonic-gatemore efficient as well.
1215*0Sstevel@tonic-gate
1216*0Sstevel@tonic-gateThere is a way to achieve a similar task from C via Perl API: create a
1217*0Sstevel@tonic-gateI<pseudo-block>, and arrange for some changes to be automatically
1218*0Sstevel@tonic-gateundone at the end of it, either explicit, or via a non-local exit (via
1219*0Sstevel@tonic-gatedie()). A I<block>-like construct is created by a pair of
1220*0Sstevel@tonic-gateC<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
1221*0Sstevel@tonic-gateSuch a construct may be created specially for some important localized
1222*0Sstevel@tonic-gatetask, or an existing one (like boundaries of enclosing Perl
1223*0Sstevel@tonic-gatesubroutine/block, or an existing pair for freeing TMPs) may be
1224*0Sstevel@tonic-gateused. (In the second case the overhead of additional localization must
1225*0Sstevel@tonic-gatebe almost negligible.) Note that any XSUB is automatically enclosed in
1226*0Sstevel@tonic-gatean C<ENTER>/C<LEAVE> pair.
1227*0Sstevel@tonic-gate
1228*0Sstevel@tonic-gateInside such a I<pseudo-block> the following service is available:
1229*0Sstevel@tonic-gate
1230*0Sstevel@tonic-gate=over 4
1231*0Sstevel@tonic-gate
1232*0Sstevel@tonic-gate=item C<SAVEINT(int i)>
1233*0Sstevel@tonic-gate
1234*0Sstevel@tonic-gate=item C<SAVEIV(IV i)>
1235*0Sstevel@tonic-gate
1236*0Sstevel@tonic-gate=item C<SAVEI32(I32 i)>
1237*0Sstevel@tonic-gate
1238*0Sstevel@tonic-gate=item C<SAVELONG(long i)>
1239*0Sstevel@tonic-gate
1240*0Sstevel@tonic-gateThese macros arrange things to restore the value of integer variable
1241*0Sstevel@tonic-gateC<i> at the end of enclosing I<pseudo-block>.
1242*0Sstevel@tonic-gate
1243*0Sstevel@tonic-gate=item C<SAVESPTR(s)>
1244*0Sstevel@tonic-gate
1245*0Sstevel@tonic-gate=item C<SAVEPPTR(p)>
1246*0Sstevel@tonic-gate
1247*0Sstevel@tonic-gateThese macros arrange things to restore the value of pointers C<s> and
1248*0Sstevel@tonic-gateC<p>. C<s> must be a pointer of a type which survives conversion to
1249*0Sstevel@tonic-gateC<SV*> and back, C<p> should be able to survive conversion to C<char*>
1250*0Sstevel@tonic-gateand back.
1251*0Sstevel@tonic-gate
1252*0Sstevel@tonic-gate=item C<SAVEFREESV(SV *sv)>
1253*0Sstevel@tonic-gate
1254*0Sstevel@tonic-gateThe refcount of C<sv> would be decremented at the end of
1255*0Sstevel@tonic-gateI<pseudo-block>.  This is similar to C<sv_2mortal> in that it is also a
1256*0Sstevel@tonic-gatemechanism for doing a delayed C<SvREFCNT_dec>.  However, while C<sv_2mortal>
1257*0Sstevel@tonic-gateextends the lifetime of C<sv> until the beginning of the next statement,
1258*0Sstevel@tonic-gateC<SAVEFREESV> extends it until the end of the enclosing scope.  These
1259*0Sstevel@tonic-gatelifetimes can be wildly different.
1260*0Sstevel@tonic-gate
1261*0Sstevel@tonic-gateAlso compare C<SAVEMORTALIZESV>.
1262*0Sstevel@tonic-gate
1263*0Sstevel@tonic-gate=item C<SAVEMORTALIZESV(SV *sv)>
1264*0Sstevel@tonic-gate
1265*0Sstevel@tonic-gateJust like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
1266*0Sstevel@tonic-gatescope instead of decrementing its reference count.  This usually has the
1267*0Sstevel@tonic-gateeffect of keeping C<sv> alive until the statement that called the currently
1268*0Sstevel@tonic-gatelive scope has finished executing.
1269*0Sstevel@tonic-gate
1270*0Sstevel@tonic-gate=item C<SAVEFREEOP(OP *op)>
1271*0Sstevel@tonic-gate
1272*0Sstevel@tonic-gateThe C<OP *> is op_free()ed at the end of I<pseudo-block>.
1273*0Sstevel@tonic-gate
1274*0Sstevel@tonic-gate=item C<SAVEFREEPV(p)>
1275*0Sstevel@tonic-gate
1276*0Sstevel@tonic-gateThe chunk of memory which is pointed to by C<p> is Safefree()ed at the
1277*0Sstevel@tonic-gateend of I<pseudo-block>.
1278*0Sstevel@tonic-gate
1279*0Sstevel@tonic-gate=item C<SAVECLEARSV(SV *sv)>
1280*0Sstevel@tonic-gate
1281*0Sstevel@tonic-gateClears a slot in the current scratchpad which corresponds to C<sv> at
1282*0Sstevel@tonic-gatethe end of I<pseudo-block>.
1283*0Sstevel@tonic-gate
1284*0Sstevel@tonic-gate=item C<SAVEDELETE(HV *hv, char *key, I32 length)>
1285*0Sstevel@tonic-gate
1286*0Sstevel@tonic-gateThe key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
1287*0Sstevel@tonic-gatestring pointed to by C<key> is Safefree()ed.  If one has a I<key> in
1288*0Sstevel@tonic-gateshort-lived storage, the corresponding string may be reallocated like
1289*0Sstevel@tonic-gatethis:
1290*0Sstevel@tonic-gate
1291*0Sstevel@tonic-gate  SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1292*0Sstevel@tonic-gate
1293*0Sstevel@tonic-gate=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
1294*0Sstevel@tonic-gate
1295*0Sstevel@tonic-gateAt the end of I<pseudo-block> the function C<f> is called with the
1296*0Sstevel@tonic-gateonly argument C<p>.
1297*0Sstevel@tonic-gate
1298*0Sstevel@tonic-gate=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
1299*0Sstevel@tonic-gate
1300*0Sstevel@tonic-gateAt the end of I<pseudo-block> the function C<f> is called with the
1301*0Sstevel@tonic-gateimplicit context argument (if any), and C<p>.
1302*0Sstevel@tonic-gate
1303*0Sstevel@tonic-gate=item C<SAVESTACK_POS()>
1304*0Sstevel@tonic-gate
1305*0Sstevel@tonic-gateThe current offset on the Perl internal stack (cf. C<SP>) is restored
1306*0Sstevel@tonic-gateat the end of I<pseudo-block>.
1307*0Sstevel@tonic-gate
1308*0Sstevel@tonic-gate=back
1309*0Sstevel@tonic-gate
1310*0Sstevel@tonic-gateThe following API list contains functions, thus one needs to
1311*0Sstevel@tonic-gateprovide pointers to the modifiable data explicitly (either C pointers,
1312*0Sstevel@tonic-gateor Perlish C<GV *>s).  Where the above macros take C<int>, a similar
1313*0Sstevel@tonic-gatefunction takes C<int *>.
1314*0Sstevel@tonic-gate
1315*0Sstevel@tonic-gate=over 4
1316*0Sstevel@tonic-gate
1317*0Sstevel@tonic-gate=item C<SV* save_scalar(GV *gv)>
1318*0Sstevel@tonic-gate
1319*0Sstevel@tonic-gateEquivalent to Perl code C<local $gv>.
1320*0Sstevel@tonic-gate
1321*0Sstevel@tonic-gate=item C<AV* save_ary(GV *gv)>
1322*0Sstevel@tonic-gate
1323*0Sstevel@tonic-gate=item C<HV* save_hash(GV *gv)>
1324*0Sstevel@tonic-gate
1325*0Sstevel@tonic-gateSimilar to C<save_scalar>, but localize C<@gv> and C<%gv>.
1326*0Sstevel@tonic-gate
1327*0Sstevel@tonic-gate=item C<void save_item(SV *item)>
1328*0Sstevel@tonic-gate
1329*0Sstevel@tonic-gateDuplicates the current value of C<SV>, on the exit from the current
1330*0Sstevel@tonic-gateC<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
1331*0Sstevel@tonic-gateusing the stored value.
1332*0Sstevel@tonic-gate
1333*0Sstevel@tonic-gate=item C<void save_list(SV **sarg, I32 maxsarg)>
1334*0Sstevel@tonic-gate
1335*0Sstevel@tonic-gateA variant of C<save_item> which takes multiple arguments via an array
1336*0Sstevel@tonic-gateC<sarg> of C<SV*> of length C<maxsarg>.
1337*0Sstevel@tonic-gate
1338*0Sstevel@tonic-gate=item C<SV* save_svref(SV **sptr)>
1339*0Sstevel@tonic-gate
1340*0Sstevel@tonic-gateSimilar to C<save_scalar>, but will reinstate an C<SV *>.
1341*0Sstevel@tonic-gate
1342*0Sstevel@tonic-gate=item C<void save_aptr(AV **aptr)>
1343*0Sstevel@tonic-gate
1344*0Sstevel@tonic-gate=item C<void save_hptr(HV **hptr)>
1345*0Sstevel@tonic-gate
1346*0Sstevel@tonic-gateSimilar to C<save_svref>, but localize C<AV *> and C<HV *>.
1347*0Sstevel@tonic-gate
1348*0Sstevel@tonic-gate=back
1349*0Sstevel@tonic-gate
1350*0Sstevel@tonic-gateThe C<Alias> module implements localization of the basic types within the
1351*0Sstevel@tonic-gateI<caller's scope>.  People who are interested in how to localize things in
1352*0Sstevel@tonic-gatethe containing scope should take a look there too.
1353*0Sstevel@tonic-gate
1354*0Sstevel@tonic-gate=head1 Subroutines
1355*0Sstevel@tonic-gate
1356*0Sstevel@tonic-gate=head2 XSUBs and the Argument Stack
1357*0Sstevel@tonic-gate
1358*0Sstevel@tonic-gateThe XSUB mechanism is a simple way for Perl programs to access C subroutines.
1359*0Sstevel@tonic-gateAn XSUB routine will have a stack that contains the arguments from the Perl
1360*0Sstevel@tonic-gateprogram, and a way to map from the Perl data structures to a C equivalent.
1361*0Sstevel@tonic-gate
1362*0Sstevel@tonic-gateThe stack arguments are accessible through the C<ST(n)> macro, which returns
1363*0Sstevel@tonic-gatethe C<n>'th stack argument.  Argument 0 is the first argument passed in the
1364*0Sstevel@tonic-gatePerl subroutine call.  These arguments are C<SV*>, and can be used anywhere
1365*0Sstevel@tonic-gatean C<SV*> is used.
1366*0Sstevel@tonic-gate
1367*0Sstevel@tonic-gateMost of the time, output from the C routine can be handled through use of
1368*0Sstevel@tonic-gatethe RETVAL and OUTPUT directives.  However, there are some cases where the
1369*0Sstevel@tonic-gateargument stack is not already long enough to handle all the return values.
1370*0Sstevel@tonic-gateAn example is the POSIX tzname() call, which takes no arguments, but returns
1371*0Sstevel@tonic-gatetwo, the local time zone's standard and summer time abbreviations.
1372*0Sstevel@tonic-gate
1373*0Sstevel@tonic-gateTo handle this situation, the PPCODE directive is used and the stack is
1374*0Sstevel@tonic-gateextended using the macro:
1375*0Sstevel@tonic-gate
1376*0Sstevel@tonic-gate    EXTEND(SP, num);
1377*0Sstevel@tonic-gate
1378*0Sstevel@tonic-gatewhere C<SP> is the macro that represents the local copy of the stack pointer,
1379*0Sstevel@tonic-gateand C<num> is the number of elements the stack should be extended by.
1380*0Sstevel@tonic-gate
1381*0Sstevel@tonic-gateNow that there is room on the stack, values can be pushed on it using C<PUSHs>
1382*0Sstevel@tonic-gatemacro. The pushed values will often need to be "mortal" (See
1383*0Sstevel@tonic-gateL</Reference Counts and Mortality>).
1384*0Sstevel@tonic-gate
1385*0Sstevel@tonic-gate    PUSHs(sv_2mortal(newSViv(an_integer)))
1386*0Sstevel@tonic-gate    PUSHs(sv_2mortal(newSVpv("Some String",0)))
1387*0Sstevel@tonic-gate    PUSHs(sv_2mortal(newSVnv(3.141592)))
1388*0Sstevel@tonic-gate
1389*0Sstevel@tonic-gateAnd now the Perl program calling C<tzname>, the two values will be assigned
1390*0Sstevel@tonic-gateas in:
1391*0Sstevel@tonic-gate
1392*0Sstevel@tonic-gate    ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1393*0Sstevel@tonic-gate
1394*0Sstevel@tonic-gateAn alternate (and possibly simpler) method to pushing values on the stack is
1395*0Sstevel@tonic-gateto use the macro:
1396*0Sstevel@tonic-gate
1397*0Sstevel@tonic-gate    XPUSHs(SV*)
1398*0Sstevel@tonic-gate
1399*0Sstevel@tonic-gateThis macro automatically adjust the stack for you, if needed.  Thus, you
1400*0Sstevel@tonic-gatedo not need to call C<EXTEND> to extend the stack.
1401*0Sstevel@tonic-gate
1402*0Sstevel@tonic-gateDespite their suggestions in earlier versions of this document the macros
1403*0Sstevel@tonic-gateC<PUSHi>, C<PUSHn> and C<PUSHp> are I<not> suited to XSUBs which return
1404*0Sstevel@tonic-gatemultiple results, see L</Putting a C value on Perl stack>.
1405*0Sstevel@tonic-gate
1406*0Sstevel@tonic-gateFor more information, consult L<perlxs> and L<perlxstut>.
1407*0Sstevel@tonic-gate
1408*0Sstevel@tonic-gate=head2 Calling Perl Routines from within C Programs
1409*0Sstevel@tonic-gate
1410*0Sstevel@tonic-gateThere are four routines that can be used to call a Perl subroutine from
1411*0Sstevel@tonic-gatewithin a C program.  These four are:
1412*0Sstevel@tonic-gate
1413*0Sstevel@tonic-gate    I32  call_sv(SV*, I32);
1414*0Sstevel@tonic-gate    I32  call_pv(const char*, I32);
1415*0Sstevel@tonic-gate    I32  call_method(const char*, I32);
1416*0Sstevel@tonic-gate    I32  call_argv(const char*, I32, register char**);
1417*0Sstevel@tonic-gate
1418*0Sstevel@tonic-gateThe routine most often used is C<call_sv>.  The C<SV*> argument
1419*0Sstevel@tonic-gatecontains either the name of the Perl subroutine to be called, or a
1420*0Sstevel@tonic-gatereference to the subroutine.  The second argument consists of flags
1421*0Sstevel@tonic-gatethat control the context in which the subroutine is called, whether
1422*0Sstevel@tonic-gateor not the subroutine is being passed arguments, how errors should be
1423*0Sstevel@tonic-gatetrapped, and how to treat return values.
1424*0Sstevel@tonic-gate
1425*0Sstevel@tonic-gateAll four routines return the number of arguments that the subroutine returned
1426*0Sstevel@tonic-gateon the Perl stack.
1427*0Sstevel@tonic-gate
1428*0Sstevel@tonic-gateThese routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0,
1429*0Sstevel@tonic-gatebut those names are now deprecated; macros of the same name are provided for
1430*0Sstevel@tonic-gatecompatibility.
1431*0Sstevel@tonic-gate
1432*0Sstevel@tonic-gateWhen using any of these routines (except C<call_argv>), the programmer
1433*0Sstevel@tonic-gatemust manipulate the Perl stack.  These include the following macros and
1434*0Sstevel@tonic-gatefunctions:
1435*0Sstevel@tonic-gate
1436*0Sstevel@tonic-gate    dSP
1437*0Sstevel@tonic-gate    SP
1438*0Sstevel@tonic-gate    PUSHMARK()
1439*0Sstevel@tonic-gate    PUTBACK
1440*0Sstevel@tonic-gate    SPAGAIN
1441*0Sstevel@tonic-gate    ENTER
1442*0Sstevel@tonic-gate    SAVETMPS
1443*0Sstevel@tonic-gate    FREETMPS
1444*0Sstevel@tonic-gate    LEAVE
1445*0Sstevel@tonic-gate    XPUSH*()
1446*0Sstevel@tonic-gate    POP*()
1447*0Sstevel@tonic-gate
1448*0Sstevel@tonic-gateFor a detailed description of calling conventions from C to Perl,
1449*0Sstevel@tonic-gateconsult L<perlcall>.
1450*0Sstevel@tonic-gate
1451*0Sstevel@tonic-gate=head2 Memory Allocation
1452*0Sstevel@tonic-gate
1453*0Sstevel@tonic-gate=head3 Allocation
1454*0Sstevel@tonic-gate
1455*0Sstevel@tonic-gateAll memory meant to be used with the Perl API functions should be manipulated
1456*0Sstevel@tonic-gateusing the macros described in this section.  The macros provide the necessary
1457*0Sstevel@tonic-gatetransparency between differences in the actual malloc implementation that is
1458*0Sstevel@tonic-gateused within perl.
1459*0Sstevel@tonic-gate
1460*0Sstevel@tonic-gateIt is suggested that you enable the version of malloc that is distributed
1461*0Sstevel@tonic-gatewith Perl.  It keeps pools of various sizes of unallocated memory in
1462*0Sstevel@tonic-gateorder to satisfy allocation requests more quickly.  However, on some
1463*0Sstevel@tonic-gateplatforms, it may cause spurious malloc or free errors.
1464*0Sstevel@tonic-gate
1465*0Sstevel@tonic-gateThe following three macros are used to initially allocate memory :
1466*0Sstevel@tonic-gate
1467*0Sstevel@tonic-gate    New(x, pointer, number, type);
1468*0Sstevel@tonic-gate    Newc(x, pointer, number, type, cast);
1469*0Sstevel@tonic-gate    Newz(x, pointer, number, type);
1470*0Sstevel@tonic-gate
1471*0Sstevel@tonic-gateThe first argument C<x> was a "magic cookie" that was used to keep track
1472*0Sstevel@tonic-gateof who called the macro, to help when debugging memory problems.  However,
1473*0Sstevel@tonic-gatethe current code makes no use of this feature (most Perl developers now
1474*0Sstevel@tonic-gateuse run-time memory checkers), so this argument can be any number.
1475*0Sstevel@tonic-gate
1476*0Sstevel@tonic-gateThe second argument C<pointer> should be the name of a variable that will
1477*0Sstevel@tonic-gatepoint to the newly allocated memory.
1478*0Sstevel@tonic-gate
1479*0Sstevel@tonic-gateThe third and fourth arguments C<number> and C<type> specify how many of
1480*0Sstevel@tonic-gatethe specified type of data structure should be allocated.  The argument
1481*0Sstevel@tonic-gateC<type> is passed to C<sizeof>.  The final argument to C<Newc>, C<cast>,
1482*0Sstevel@tonic-gateshould be used if the C<pointer> argument is different from the C<type>
1483*0Sstevel@tonic-gateargument.
1484*0Sstevel@tonic-gate
1485*0Sstevel@tonic-gateUnlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
1486*0Sstevel@tonic-gateto zero out all the newly allocated memory.
1487*0Sstevel@tonic-gate
1488*0Sstevel@tonic-gate=head3 Reallocation
1489*0Sstevel@tonic-gate
1490*0Sstevel@tonic-gate    Renew(pointer, number, type);
1491*0Sstevel@tonic-gate    Renewc(pointer, number, type, cast);
1492*0Sstevel@tonic-gate    Safefree(pointer)
1493*0Sstevel@tonic-gate
1494*0Sstevel@tonic-gateThese three macros are used to change a memory buffer size or to free a
1495*0Sstevel@tonic-gatepiece of memory no longer needed.  The arguments to C<Renew> and C<Renewc>
1496*0Sstevel@tonic-gatematch those of C<New> and C<Newc> with the exception of not needing the
1497*0Sstevel@tonic-gate"magic cookie" argument.
1498*0Sstevel@tonic-gate
1499*0Sstevel@tonic-gate=head3 Moving
1500*0Sstevel@tonic-gate
1501*0Sstevel@tonic-gate    Move(source, dest, number, type);
1502*0Sstevel@tonic-gate    Copy(source, dest, number, type);
1503*0Sstevel@tonic-gate    Zero(dest, number, type);
1504*0Sstevel@tonic-gate
1505*0Sstevel@tonic-gateThese three macros are used to move, copy, or zero out previously allocated
1506*0Sstevel@tonic-gatememory.  The C<source> and C<dest> arguments point to the source and
1507*0Sstevel@tonic-gatedestination starting points.  Perl will move, copy, or zero out C<number>
1508*0Sstevel@tonic-gateinstances of the size of the C<type> data structure (using the C<sizeof>
1509*0Sstevel@tonic-gatefunction).
1510*0Sstevel@tonic-gate
1511*0Sstevel@tonic-gate=head2 PerlIO
1512*0Sstevel@tonic-gate
1513*0Sstevel@tonic-gateThe most recent development releases of Perl has been experimenting with
1514*0Sstevel@tonic-gateremoving Perl's dependency on the "normal" standard I/O suite and allowing
1515*0Sstevel@tonic-gateother stdio implementations to be used.  This involves creating a new
1516*0Sstevel@tonic-gateabstraction layer that then calls whichever implementation of stdio Perl
1517*0Sstevel@tonic-gatewas compiled with.  All XSUBs should now use the functions in the PerlIO
1518*0Sstevel@tonic-gateabstraction layer and not make any assumptions about what kind of stdio
1519*0Sstevel@tonic-gateis being used.
1520*0Sstevel@tonic-gate
1521*0Sstevel@tonic-gateFor a complete description of the PerlIO abstraction, consult L<perlapio>.
1522*0Sstevel@tonic-gate
1523*0Sstevel@tonic-gate=head2 Putting a C value on Perl stack
1524*0Sstevel@tonic-gate
1525*0Sstevel@tonic-gateA lot of opcodes (this is an elementary operation in the internal perl
1526*0Sstevel@tonic-gatestack machine) put an SV* on the stack. However, as an optimization
1527*0Sstevel@tonic-gatethe corresponding SV is (usually) not recreated each time. The opcodes
1528*0Sstevel@tonic-gatereuse specially assigned SVs (I<target>s) which are (as a corollary)
1529*0Sstevel@tonic-gatenot constantly freed/created.
1530*0Sstevel@tonic-gate
1531*0Sstevel@tonic-gateEach of the targets is created only once (but see
1532*0Sstevel@tonic-gateL<Scratchpads and recursion> below), and when an opcode needs to put
1533*0Sstevel@tonic-gatean integer, a double, or a string on stack, it just sets the
1534*0Sstevel@tonic-gatecorresponding parts of its I<target> and puts the I<target> on stack.
1535*0Sstevel@tonic-gate
1536*0Sstevel@tonic-gateThe macro to put this target on stack is C<PUSHTARG>, and it is
1537*0Sstevel@tonic-gatedirectly used in some opcodes, as well as indirectly in zillions of
1538*0Sstevel@tonic-gateothers, which use it via C<(X)PUSH[pni]>.
1539*0Sstevel@tonic-gate
1540*0Sstevel@tonic-gateBecause the target is reused, you must be careful when pushing multiple
1541*0Sstevel@tonic-gatevalues on the stack. The following code will not do what you think:
1542*0Sstevel@tonic-gate
1543*0Sstevel@tonic-gate    XPUSHi(10);
1544*0Sstevel@tonic-gate    XPUSHi(20);
1545*0Sstevel@tonic-gate
1546*0Sstevel@tonic-gateThis translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
1547*0Sstevel@tonic-gatethe stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
1548*0Sstevel@tonic-gateAt the end of the operation, the stack does not contain the values 10
1549*0Sstevel@tonic-gateand 20, but actually contains two pointers to C<TARG>, which we have set
1550*0Sstevel@tonic-gateto 20. If you need to push multiple different values, use C<XPUSHs>,
1551*0Sstevel@tonic-gatewhich bypasses C<TARG>.
1552*0Sstevel@tonic-gate
1553*0Sstevel@tonic-gateOn a related note, if you do use C<(X)PUSH[npi]>, then you're going to
1554*0Sstevel@tonic-gateneed a C<dTARG> in your variable declarations so that the C<*PUSH*>
1555*0Sstevel@tonic-gatemacros can make use of the local variable C<TARG>.
1556*0Sstevel@tonic-gate
1557*0Sstevel@tonic-gate=head2 Scratchpads
1558*0Sstevel@tonic-gate
1559*0Sstevel@tonic-gateThe question remains on when the SVs which are I<target>s for opcodes
1560*0Sstevel@tonic-gateare created. The answer is that they are created when the current unit --
1561*0Sstevel@tonic-gatea subroutine or a file (for opcodes for statements outside of
1562*0Sstevel@tonic-gatesubroutines) -- is compiled. During this time a special anonymous Perl
1563*0Sstevel@tonic-gatearray is created, which is called a scratchpad for the current
1564*0Sstevel@tonic-gateunit.
1565*0Sstevel@tonic-gate
1566*0Sstevel@tonic-gateA scratchpad keeps SVs which are lexicals for the current unit and are
1567*0Sstevel@tonic-gatetargets for opcodes. One can deduce that an SV lives on a scratchpad
1568*0Sstevel@tonic-gateby looking on its flags: lexicals have C<SVs_PADMY> set, and
1569*0Sstevel@tonic-gateI<target>s have C<SVs_PADTMP> set.
1570*0Sstevel@tonic-gate
1571*0Sstevel@tonic-gateThe correspondence between OPs and I<target>s is not 1-to-1. Different
1572*0Sstevel@tonic-gateOPs in the compile tree of the unit can use the same target, if this
1573*0Sstevel@tonic-gatewould not conflict with the expected life of the temporary.
1574*0Sstevel@tonic-gate
1575*0Sstevel@tonic-gate=head2 Scratchpads and recursion
1576*0Sstevel@tonic-gate
1577*0Sstevel@tonic-gateIn fact it is not 100% true that a compiled unit contains a pointer to
1578*0Sstevel@tonic-gatethe scratchpad AV. In fact it contains a pointer to an AV of
1579*0Sstevel@tonic-gate(initially) one element, and this element is the scratchpad AV. Why do
1580*0Sstevel@tonic-gatewe need an extra level of indirection?
1581*0Sstevel@tonic-gate
1582*0Sstevel@tonic-gateThe answer is B<recursion>, and maybe B<threads>. Both
1583*0Sstevel@tonic-gatethese can create several execution pointers going into the same
1584*0Sstevel@tonic-gatesubroutine. For the subroutine-child not write over the temporaries
1585*0Sstevel@tonic-gatefor the subroutine-parent (lifespan of which covers the call to the
1586*0Sstevel@tonic-gatechild), the parent and the child should have different
1587*0Sstevel@tonic-gatescratchpads. (I<And> the lexicals should be separate anyway!)
1588*0Sstevel@tonic-gate
1589*0Sstevel@tonic-gateSo each subroutine is born with an array of scratchpads (of length 1).
1590*0Sstevel@tonic-gateOn each entry to the subroutine it is checked that the current
1591*0Sstevel@tonic-gatedepth of the recursion is not more than the length of this array, and
1592*0Sstevel@tonic-gateif it is, new scratchpad is created and pushed into the array.
1593*0Sstevel@tonic-gate
1594*0Sstevel@tonic-gateThe I<target>s on this scratchpad are C<undef>s, but they are already
1595*0Sstevel@tonic-gatemarked with correct flags.
1596*0Sstevel@tonic-gate
1597*0Sstevel@tonic-gate=head1 Compiled code
1598*0Sstevel@tonic-gate
1599*0Sstevel@tonic-gate=head2 Code tree
1600*0Sstevel@tonic-gate
1601*0Sstevel@tonic-gateHere we describe the internal form your code is converted to by
1602*0Sstevel@tonic-gatePerl. Start with a simple example:
1603*0Sstevel@tonic-gate
1604*0Sstevel@tonic-gate  $a = $b + $c;
1605*0Sstevel@tonic-gate
1606*0Sstevel@tonic-gateThis is converted to a tree similar to this one:
1607*0Sstevel@tonic-gate
1608*0Sstevel@tonic-gate             assign-to
1609*0Sstevel@tonic-gate           /           \
1610*0Sstevel@tonic-gate          +             $a
1611*0Sstevel@tonic-gate        /   \
1612*0Sstevel@tonic-gate      $b     $c
1613*0Sstevel@tonic-gate
1614*0Sstevel@tonic-gate(but slightly more complicated).  This tree reflects the way Perl
1615*0Sstevel@tonic-gateparsed your code, but has nothing to do with the execution order.
1616*0Sstevel@tonic-gateThere is an additional "thread" going through the nodes of the tree
1617*0Sstevel@tonic-gatewhich shows the order of execution of the nodes.  In our simplified
1618*0Sstevel@tonic-gateexample above it looks like:
1619*0Sstevel@tonic-gate
1620*0Sstevel@tonic-gate     $b ---> $c ---> + ---> $a ---> assign-to
1621*0Sstevel@tonic-gate
1622*0Sstevel@tonic-gateBut with the actual compile tree for C<$a = $b + $c> it is different:
1623*0Sstevel@tonic-gatesome nodes I<optimized away>.  As a corollary, though the actual tree
1624*0Sstevel@tonic-gatecontains more nodes than our simplified example, the execution order
1625*0Sstevel@tonic-gateis the same as in our example.
1626*0Sstevel@tonic-gate
1627*0Sstevel@tonic-gate=head2 Examining the tree
1628*0Sstevel@tonic-gate
1629*0Sstevel@tonic-gateIf you have your perl compiled for debugging (usually done with
1630*0Sstevel@tonic-gateC<-DDEBUGGING> on the C<Configure> command line), you may examine the
1631*0Sstevel@tonic-gatecompiled tree by specifying C<-Dx> on the Perl command line.  The
1632*0Sstevel@tonic-gateoutput takes several lines per node, and for C<$b+$c> it looks like
1633*0Sstevel@tonic-gatethis:
1634*0Sstevel@tonic-gate
1635*0Sstevel@tonic-gate    5           TYPE = add  ===> 6
1636*0Sstevel@tonic-gate                TARG = 1
1637*0Sstevel@tonic-gate                FLAGS = (SCALAR,KIDS)
1638*0Sstevel@tonic-gate                {
1639*0Sstevel@tonic-gate                    TYPE = null  ===> (4)
1640*0Sstevel@tonic-gate                      (was rv2sv)
1641*0Sstevel@tonic-gate                    FLAGS = (SCALAR,KIDS)
1642*0Sstevel@tonic-gate                    {
1643*0Sstevel@tonic-gate    3                   TYPE = gvsv  ===> 4
1644*0Sstevel@tonic-gate                        FLAGS = (SCALAR)
1645*0Sstevel@tonic-gate                        GV = main::b
1646*0Sstevel@tonic-gate                    }
1647*0Sstevel@tonic-gate                }
1648*0Sstevel@tonic-gate                {
1649*0Sstevel@tonic-gate                    TYPE = null  ===> (5)
1650*0Sstevel@tonic-gate                      (was rv2sv)
1651*0Sstevel@tonic-gate                    FLAGS = (SCALAR,KIDS)
1652*0Sstevel@tonic-gate                    {
1653*0Sstevel@tonic-gate    4                   TYPE = gvsv  ===> 5
1654*0Sstevel@tonic-gate                        FLAGS = (SCALAR)
1655*0Sstevel@tonic-gate                        GV = main::c
1656*0Sstevel@tonic-gate                    }
1657*0Sstevel@tonic-gate                }
1658*0Sstevel@tonic-gate
1659*0Sstevel@tonic-gateThis tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
1660*0Sstevel@tonic-gatenot optimized away (one per number in the left column).  The immediate
1661*0Sstevel@tonic-gatechildren of the given node correspond to C<{}> pairs on the same level
1662*0Sstevel@tonic-gateof indentation, thus this listing corresponds to the tree:
1663*0Sstevel@tonic-gate
1664*0Sstevel@tonic-gate                   add
1665*0Sstevel@tonic-gate                 /     \
1666*0Sstevel@tonic-gate               null    null
1667*0Sstevel@tonic-gate                |       |
1668*0Sstevel@tonic-gate               gvsv    gvsv
1669*0Sstevel@tonic-gate
1670*0Sstevel@tonic-gateThe execution order is indicated by C<===E<gt>> marks, thus it is C<3
1671*0Sstevel@tonic-gate4 5 6> (node C<6> is not included into above listing), i.e.,
1672*0Sstevel@tonic-gateC<gvsv gvsv add whatever>.
1673*0Sstevel@tonic-gate
1674*0Sstevel@tonic-gateEach of these nodes represents an op, a fundamental operation inside the
1675*0Sstevel@tonic-gatePerl core. The code which implements each operation can be found in the
1676*0Sstevel@tonic-gateF<pp*.c> files; the function which implements the op with type C<gvsv>
1677*0Sstevel@tonic-gateis C<pp_gvsv>, and so on. As the tree above shows, different ops have
1678*0Sstevel@tonic-gatedifferent numbers of children: C<add> is a binary operator, as one would
1679*0Sstevel@tonic-gateexpect, and so has two children. To accommodate the various different
1680*0Sstevel@tonic-gatenumbers of children, there are various types of op data structure, and
1681*0Sstevel@tonic-gatethey link together in different ways.
1682*0Sstevel@tonic-gate
1683*0Sstevel@tonic-gateThe simplest type of op structure is C<OP>: this has no children. Unary
1684*0Sstevel@tonic-gateoperators, C<UNOP>s, have one child, and this is pointed to by the
1685*0Sstevel@tonic-gateC<op_first> field. Binary operators (C<BINOP>s) have not only an
1686*0Sstevel@tonic-gateC<op_first> field but also an C<op_last> field. The most complex type of
1687*0Sstevel@tonic-gateop is a C<LISTOP>, which has any number of children. In this case, the
1688*0Sstevel@tonic-gatefirst child is pointed to by C<op_first> and the last child by
1689*0Sstevel@tonic-gateC<op_last>. The children in between can be found by iteratively
1690*0Sstevel@tonic-gatefollowing the C<op_sibling> pointer from the first child to the last.
1691*0Sstevel@tonic-gate
1692*0Sstevel@tonic-gateThere are also two other op types: a C<PMOP> holds a regular expression,
1693*0Sstevel@tonic-gateand has no children, and a C<LOOP> may or may not have children. If the
1694*0Sstevel@tonic-gateC<op_children> field is non-zero, it behaves like a C<LISTOP>. To
1695*0Sstevel@tonic-gatecomplicate matters, if a C<UNOP> is actually a C<null> op after
1696*0Sstevel@tonic-gateoptimization (see L</Compile pass 2: context propagation>) it will still
1697*0Sstevel@tonic-gatehave children in accordance with its former type.
1698*0Sstevel@tonic-gate
1699*0Sstevel@tonic-gateAnother way to examine the tree is to use a compiler back-end module, such
1700*0Sstevel@tonic-gateas L<B::Concise>.
1701*0Sstevel@tonic-gate
1702*0Sstevel@tonic-gate=head2 Compile pass 1: check routines
1703*0Sstevel@tonic-gate
1704*0Sstevel@tonic-gateThe tree is created by the compiler while I<yacc> code feeds it
1705*0Sstevel@tonic-gatethe constructions it recognizes. Since I<yacc> works bottom-up, so does
1706*0Sstevel@tonic-gatethe first pass of perl compilation.
1707*0Sstevel@tonic-gate
1708*0Sstevel@tonic-gateWhat makes this pass interesting for perl developers is that some
1709*0Sstevel@tonic-gateoptimization may be performed on this pass.  This is optimization by
1710*0Sstevel@tonic-gateso-called "check routines".  The correspondence between node names
1711*0Sstevel@tonic-gateand corresponding check routines is described in F<opcode.pl> (do not
1712*0Sstevel@tonic-gateforget to run C<make regen_headers> if you modify this file).
1713*0Sstevel@tonic-gate
1714*0Sstevel@tonic-gateA check routine is called when the node is fully constructed except
1715*0Sstevel@tonic-gatefor the execution-order thread.  Since at this time there are no
1716*0Sstevel@tonic-gateback-links to the currently constructed node, one can do most any
1717*0Sstevel@tonic-gateoperation to the top-level node, including freeing it and/or creating
1718*0Sstevel@tonic-gatenew nodes above/below it.
1719*0Sstevel@tonic-gate
1720*0Sstevel@tonic-gateThe check routine returns the node which should be inserted into the
1721*0Sstevel@tonic-gatetree (if the top-level node was not modified, check routine returns
1722*0Sstevel@tonic-gateits argument).
1723*0Sstevel@tonic-gate
1724*0Sstevel@tonic-gateBy convention, check routines have names C<ck_*>. They are usually
1725*0Sstevel@tonic-gatecalled from C<new*OP> subroutines (or C<convert>) (which in turn are
1726*0Sstevel@tonic-gatecalled from F<perly.y>).
1727*0Sstevel@tonic-gate
1728*0Sstevel@tonic-gate=head2 Compile pass 1a: constant folding
1729*0Sstevel@tonic-gate
1730*0Sstevel@tonic-gateImmediately after the check routine is called the returned node is
1731*0Sstevel@tonic-gatechecked for being compile-time executable.  If it is (the value is
1732*0Sstevel@tonic-gatejudged to be constant) it is immediately executed, and a I<constant>
1733*0Sstevel@tonic-gatenode with the "return value" of the corresponding subtree is
1734*0Sstevel@tonic-gatesubstituted instead.  The subtree is deleted.
1735*0Sstevel@tonic-gate
1736*0Sstevel@tonic-gateIf constant folding was not performed, the execution-order thread is
1737*0Sstevel@tonic-gatecreated.
1738*0Sstevel@tonic-gate
1739*0Sstevel@tonic-gate=head2 Compile pass 2: context propagation
1740*0Sstevel@tonic-gate
1741*0Sstevel@tonic-gateWhen a context for a part of compile tree is known, it is propagated
1742*0Sstevel@tonic-gatedown through the tree.  At this time the context can have 5 values
1743*0Sstevel@tonic-gate(instead of 2 for runtime context): void, boolean, scalar, list, and
1744*0Sstevel@tonic-gatelvalue.  In contrast with the pass 1 this pass is processed from top
1745*0Sstevel@tonic-gateto bottom: a node's context determines the context for its children.
1746*0Sstevel@tonic-gate
1747*0Sstevel@tonic-gateAdditional context-dependent optimizations are performed at this time.
1748*0Sstevel@tonic-gateSince at this moment the compile tree contains back-references (via
1749*0Sstevel@tonic-gate"thread" pointers), nodes cannot be free()d now.  To allow
1750*0Sstevel@tonic-gateoptimized-away nodes at this stage, such nodes are null()ified instead
1751*0Sstevel@tonic-gateof free()ing (i.e. their type is changed to OP_NULL).
1752*0Sstevel@tonic-gate
1753*0Sstevel@tonic-gate=head2 Compile pass 3: peephole optimization
1754*0Sstevel@tonic-gate
1755*0Sstevel@tonic-gateAfter the compile tree for a subroutine (or for an C<eval> or a file)
1756*0Sstevel@tonic-gateis created, an additional pass over the code is performed. This pass
1757*0Sstevel@tonic-gateis neither top-down or bottom-up, but in the execution order (with
1758*0Sstevel@tonic-gateadditional complications for conditionals).  These optimizations are
1759*0Sstevel@tonic-gatedone in the subroutine peep().  Optimizations performed at this stage
1760*0Sstevel@tonic-gateare subject to the same restrictions as in the pass 2.
1761*0Sstevel@tonic-gate
1762*0Sstevel@tonic-gate=head2 Pluggable runops
1763*0Sstevel@tonic-gate
1764*0Sstevel@tonic-gateThe compile tree is executed in a runops function.  There are two runops
1765*0Sstevel@tonic-gatefunctions, in F<run.c> and in F<dump.c>.  C<Perl_runops_debug> is used
1766*0Sstevel@tonic-gatewith DEBUGGING and C<Perl_runops_standard> is used otherwise.  For fine
1767*0Sstevel@tonic-gatecontrol over the execution of the compile tree it is possible to provide
1768*0Sstevel@tonic-gateyour own runops function.
1769*0Sstevel@tonic-gate
1770*0Sstevel@tonic-gateIt's probably best to copy one of the existing runops functions and
1771*0Sstevel@tonic-gatechange it to suit your needs.  Then, in the BOOT section of your XS
1772*0Sstevel@tonic-gatefile, add the line:
1773*0Sstevel@tonic-gate
1774*0Sstevel@tonic-gate  PL_runops = my_runops;
1775*0Sstevel@tonic-gate
1776*0Sstevel@tonic-gateThis function should be as efficient as possible to keep your programs
1777*0Sstevel@tonic-gaterunning as fast as possible.
1778*0Sstevel@tonic-gate
1779*0Sstevel@tonic-gate=head1 Examining internal data structures with the C<dump> functions
1780*0Sstevel@tonic-gate
1781*0Sstevel@tonic-gateTo aid debugging, the source file F<dump.c> contains a number of
1782*0Sstevel@tonic-gatefunctions which produce formatted output of internal data structures.
1783*0Sstevel@tonic-gate
1784*0Sstevel@tonic-gateThe most commonly used of these functions is C<Perl_sv_dump>; it's used
1785*0Sstevel@tonic-gatefor dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
1786*0Sstevel@tonic-gateC<sv_dump> to produce debugging output from Perl-space, so users of that
1787*0Sstevel@tonic-gatemodule should already be familiar with its format.
1788*0Sstevel@tonic-gate
1789*0Sstevel@tonic-gateC<Perl_op_dump> can be used to dump an C<OP> structure or any of its
1790*0Sstevel@tonic-gatederivatives, and produces output similar to C<perl -Dx>; in fact,
1791*0Sstevel@tonic-gateC<Perl_dump_eval> will dump the main root of the code being evaluated,
1792*0Sstevel@tonic-gateexactly like C<-Dx>.
1793*0Sstevel@tonic-gate
1794*0Sstevel@tonic-gateOther useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
1795*0Sstevel@tonic-gateop tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
1796*0Sstevel@tonic-gatesubroutines in a package like so: (Thankfully, these are all xsubs, so
1797*0Sstevel@tonic-gatethere is no op tree)
1798*0Sstevel@tonic-gate
1799*0Sstevel@tonic-gate    (gdb) print Perl_dump_packsubs(PL_defstash)
1800*0Sstevel@tonic-gate
1801*0Sstevel@tonic-gate    SUB attributes::bootstrap = (xsub 0x811fedc 0)
1802*0Sstevel@tonic-gate
1803*0Sstevel@tonic-gate    SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1804*0Sstevel@tonic-gate
1805*0Sstevel@tonic-gate    SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1806*0Sstevel@tonic-gate
1807*0Sstevel@tonic-gate    SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1808*0Sstevel@tonic-gate
1809*0Sstevel@tonic-gate    SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
1810*0Sstevel@tonic-gate
1811*0Sstevel@tonic-gateand C<Perl_dump_all>, which dumps all the subroutines in the stash and
1812*0Sstevel@tonic-gatethe op tree of the main root.
1813*0Sstevel@tonic-gate
1814*0Sstevel@tonic-gate=head1 How multiple interpreters and concurrency are supported
1815*0Sstevel@tonic-gate
1816*0Sstevel@tonic-gate=head2 Background and PERL_IMPLICIT_CONTEXT
1817*0Sstevel@tonic-gate
1818*0Sstevel@tonic-gateThe Perl interpreter can be regarded as a closed box: it has an API
1819*0Sstevel@tonic-gatefor feeding it code or otherwise making it do things, but it also has
1820*0Sstevel@tonic-gatefunctions for its own use.  This smells a lot like an object, and
1821*0Sstevel@tonic-gatethere are ways for you to build Perl so that you can have multiple
1822*0Sstevel@tonic-gateinterpreters, with one interpreter represented either as a C structure,
1823*0Sstevel@tonic-gateor inside a thread-specific structure.  These structures contain all
1824*0Sstevel@tonic-gatethe context, the state of that interpreter.
1825*0Sstevel@tonic-gate
1826*0Sstevel@tonic-gateTwo macros control the major Perl build flavors: MULTIPLICITY and
1827*0Sstevel@tonic-gateUSE_5005THREADS.  The MULTIPLICITY build has a C structure
1828*0Sstevel@tonic-gatethat packages all the interpreter state, and there is a similar thread-specific
1829*0Sstevel@tonic-gatedata structure under USE_5005THREADS.  In both cases,
1830*0Sstevel@tonic-gatePERL_IMPLICIT_CONTEXT is also normally defined, and enables the
1831*0Sstevel@tonic-gatesupport for passing in a "hidden" first argument that represents all three
1832*0Sstevel@tonic-gatedata structures.
1833*0Sstevel@tonic-gate
1834*0Sstevel@tonic-gateAll this obviously requires a way for the Perl internal functions to be
1835*0Sstevel@tonic-gateeither subroutines taking some kind of structure as the first
1836*0Sstevel@tonic-gateargument, or subroutines taking nothing as the first argument.  To
1837*0Sstevel@tonic-gateenable these two very different ways of building the interpreter,
1838*0Sstevel@tonic-gatethe Perl source (as it does in so many other situations) makes heavy
1839*0Sstevel@tonic-gateuse of macros and subroutine naming conventions.
1840*0Sstevel@tonic-gate
1841*0Sstevel@tonic-gateFirst problem: deciding which functions will be public API functions and
1842*0Sstevel@tonic-gatewhich will be private.  All functions whose names begin C<S_> are private
1843*0Sstevel@tonic-gate(think "S" for "secret" or "static").  All other functions begin with
1844*0Sstevel@tonic-gate"Perl_", but just because a function begins with "Perl_" does not mean it is
1845*0Sstevel@tonic-gatepart of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
1846*0Sstevel@tonic-gatefunction is part of the API is to find its entry in L<perlapi>.
1847*0Sstevel@tonic-gateIf it exists in L<perlapi>, it's part of the API.  If it doesn't, and you
1848*0Sstevel@tonic-gatethink it should be (i.e., you need it for your extension), send mail via
1849*0Sstevel@tonic-gateL<perlbug> explaining why you think it should be.
1850*0Sstevel@tonic-gate
1851*0Sstevel@tonic-gateSecond problem: there must be a syntax so that the same subroutine
1852*0Sstevel@tonic-gatedeclarations and calls can pass a structure as their first argument,
1853*0Sstevel@tonic-gateor pass nothing.  To solve this, the subroutines are named and
1854*0Sstevel@tonic-gatedeclared in a particular way.  Here's a typical start of a static
1855*0Sstevel@tonic-gatefunction used within the Perl guts:
1856*0Sstevel@tonic-gate
1857*0Sstevel@tonic-gate  STATIC void
1858*0Sstevel@tonic-gate  S_incline(pTHX_ char *s)
1859*0Sstevel@tonic-gate
1860*0Sstevel@tonic-gateSTATIC becomes "static" in C, and may be #define'd to nothing in some
1861*0Sstevel@tonic-gateconfigurations in future.
1862*0Sstevel@tonic-gate
1863*0Sstevel@tonic-gateA public function (i.e. part of the internal API, but not necessarily
1864*0Sstevel@tonic-gatesanctioned for use in extensions) begins like this:
1865*0Sstevel@tonic-gate
1866*0Sstevel@tonic-gate  void
1867*0Sstevel@tonic-gate  Perl_sv_setiv(pTHX_ SV* dsv, IV num)
1868*0Sstevel@tonic-gate
1869*0Sstevel@tonic-gateC<pTHX_> is one of a number of macros (in perl.h) that hide the
1870*0Sstevel@tonic-gatedetails of the interpreter's context.  THX stands for "thread", "this",
1871*0Sstevel@tonic-gateor "thingy", as the case may be.  (And no, George Lucas is not involved. :-)
1872*0Sstevel@tonic-gateThe first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
1873*0Sstevel@tonic-gateor 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
1874*0Sstevel@tonic-gatetheir variants.
1875*0Sstevel@tonic-gate
1876*0Sstevel@tonic-gateWhen Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
1877*0Sstevel@tonic-gatefirst argument containing the interpreter's context.  The trailing underscore
1878*0Sstevel@tonic-gatein the pTHX_ macro indicates that the macro expansion needs a comma
1879*0Sstevel@tonic-gateafter the context argument because other arguments follow it.  If
1880*0Sstevel@tonic-gatePERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
1881*0Sstevel@tonic-gatesubroutine is not prototyped to take the extra argument.  The form of the
1882*0Sstevel@tonic-gatemacro without the trailing underscore is used when there are no additional
1883*0Sstevel@tonic-gateexplicit arguments.
1884*0Sstevel@tonic-gate
1885*0Sstevel@tonic-gateWhen a core function calls another, it must pass the context.  This
1886*0Sstevel@tonic-gateis normally hidden via macros.  Consider C<sv_setiv>.  It expands into
1887*0Sstevel@tonic-gatesomething like this:
1888*0Sstevel@tonic-gate
1889*0Sstevel@tonic-gate    #ifdef PERL_IMPLICIT_CONTEXT
1890*0Sstevel@tonic-gate      #define sv_setiv(a,b)      Perl_sv_setiv(aTHX_ a, b)
1891*0Sstevel@tonic-gate      /* can't do this for vararg functions, see below */
1892*0Sstevel@tonic-gate    #else
1893*0Sstevel@tonic-gate      #define sv_setiv           Perl_sv_setiv
1894*0Sstevel@tonic-gate    #endif
1895*0Sstevel@tonic-gate
1896*0Sstevel@tonic-gateThis works well, and means that XS authors can gleefully write:
1897*0Sstevel@tonic-gate
1898*0Sstevel@tonic-gate    sv_setiv(foo, bar);
1899*0Sstevel@tonic-gate
1900*0Sstevel@tonic-gateand still have it work under all the modes Perl could have been
1901*0Sstevel@tonic-gatecompiled with.
1902*0Sstevel@tonic-gate
1903*0Sstevel@tonic-gateThis doesn't work so cleanly for varargs functions, though, as macros
1904*0Sstevel@tonic-gateimply that the number of arguments is known in advance.  Instead we
1905*0Sstevel@tonic-gateeither need to spell them out fully, passing C<aTHX_> as the first
1906*0Sstevel@tonic-gateargument (the Perl core tends to do this with functions like
1907*0Sstevel@tonic-gatePerl_warner), or use a context-free version.
1908*0Sstevel@tonic-gate
1909*0Sstevel@tonic-gateThe context-free version of Perl_warner is called
1910*0Sstevel@tonic-gatePerl_warner_nocontext, and does not take the extra argument.  Instead
1911*0Sstevel@tonic-gateit does dTHX; to get the context from thread-local storage.  We
1912*0Sstevel@tonic-gateC<#define warner Perl_warner_nocontext> so that extensions get source
1913*0Sstevel@tonic-gatecompatibility at the expense of performance.  (Passing an arg is
1914*0Sstevel@tonic-gatecheaper than grabbing it from thread-local storage.)
1915*0Sstevel@tonic-gate
1916*0Sstevel@tonic-gateYou can ignore [pad]THXx when browsing the Perl headers/sources.
1917*0Sstevel@tonic-gateThose are strictly for use within the core.  Extensions and embedders
1918*0Sstevel@tonic-gateneed only be aware of [pad]THX.
1919*0Sstevel@tonic-gate
1920*0Sstevel@tonic-gate=head2 So what happened to dTHR?
1921*0Sstevel@tonic-gate
1922*0Sstevel@tonic-gateC<dTHR> was introduced in perl 5.005 to support the older thread model.
1923*0Sstevel@tonic-gateThe older thread model now uses the C<THX> mechanism to pass context
1924*0Sstevel@tonic-gatepointers around, so C<dTHR> is not useful any more.  Perl 5.6.0 and
1925*0Sstevel@tonic-gatelater still have it for backward source compatibility, but it is defined
1926*0Sstevel@tonic-gateto be a no-op.
1927*0Sstevel@tonic-gate
1928*0Sstevel@tonic-gate=head2 How do I use all this in extensions?
1929*0Sstevel@tonic-gate
1930*0Sstevel@tonic-gateWhen Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
1931*0Sstevel@tonic-gateany functions in the Perl API will need to pass the initial context
1932*0Sstevel@tonic-gateargument somehow.  The kicker is that you will need to write it in
1933*0Sstevel@tonic-gatesuch a way that the extension still compiles when Perl hasn't been
1934*0Sstevel@tonic-gatebuilt with PERL_IMPLICIT_CONTEXT enabled.
1935*0Sstevel@tonic-gate
1936*0Sstevel@tonic-gateThere are three ways to do this.  First, the easy but inefficient way,
1937*0Sstevel@tonic-gatewhich is also the default, in order to maintain source compatibility
1938*0Sstevel@tonic-gatewith extensions: whenever XSUB.h is #included, it redefines the aTHX
1939*0Sstevel@tonic-gateand aTHX_ macros to call a function that will return the context.
1940*0Sstevel@tonic-gateThus, something like:
1941*0Sstevel@tonic-gate
1942*0Sstevel@tonic-gate        sv_setiv(sv, num);
1943*0Sstevel@tonic-gate
1944*0Sstevel@tonic-gatein your extension will translate to this when PERL_IMPLICIT_CONTEXT is
1945*0Sstevel@tonic-gatein effect:
1946*0Sstevel@tonic-gate
1947*0Sstevel@tonic-gate        Perl_sv_setiv(Perl_get_context(), sv, num);
1948*0Sstevel@tonic-gate
1949*0Sstevel@tonic-gateor to this otherwise:
1950*0Sstevel@tonic-gate
1951*0Sstevel@tonic-gate        Perl_sv_setiv(sv, num);
1952*0Sstevel@tonic-gate
1953*0Sstevel@tonic-gateYou have to do nothing new in your extension to get this; since
1954*0Sstevel@tonic-gatethe Perl library provides Perl_get_context(), it will all just
1955*0Sstevel@tonic-gatework.
1956*0Sstevel@tonic-gate
1957*0Sstevel@tonic-gateThe second, more efficient way is to use the following template for
1958*0Sstevel@tonic-gateyour Foo.xs:
1959*0Sstevel@tonic-gate
1960*0Sstevel@tonic-gate        #define PERL_NO_GET_CONTEXT     /* we want efficiency */
1961*0Sstevel@tonic-gate        #include "EXTERN.h"
1962*0Sstevel@tonic-gate        #include "perl.h"
1963*0Sstevel@tonic-gate        #include "XSUB.h"
1964*0Sstevel@tonic-gate
1965*0Sstevel@tonic-gate        static my_private_function(int arg1, int arg2);
1966*0Sstevel@tonic-gate
1967*0Sstevel@tonic-gate        static SV *
1968*0Sstevel@tonic-gate        my_private_function(int arg1, int arg2)
1969*0Sstevel@tonic-gate        {
1970*0Sstevel@tonic-gate            dTHX;       /* fetch context */
1971*0Sstevel@tonic-gate            ... call many Perl API functions ...
1972*0Sstevel@tonic-gate        }
1973*0Sstevel@tonic-gate
1974*0Sstevel@tonic-gate        [... etc ...]
1975*0Sstevel@tonic-gate
1976*0Sstevel@tonic-gate        MODULE = Foo            PACKAGE = Foo
1977*0Sstevel@tonic-gate
1978*0Sstevel@tonic-gate        /* typical XSUB */
1979*0Sstevel@tonic-gate
1980*0Sstevel@tonic-gate        void
1981*0Sstevel@tonic-gate        my_xsub(arg)
1982*0Sstevel@tonic-gate                int arg
1983*0Sstevel@tonic-gate            CODE:
1984*0Sstevel@tonic-gate                my_private_function(arg, 10);
1985*0Sstevel@tonic-gate
1986*0Sstevel@tonic-gateNote that the only two changes from the normal way of writing an
1987*0Sstevel@tonic-gateextension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
1988*0Sstevel@tonic-gateincluding the Perl headers, followed by a C<dTHX;> declaration at
1989*0Sstevel@tonic-gatethe start of every function that will call the Perl API.  (You'll
1990*0Sstevel@tonic-gateknow which functions need this, because the C compiler will complain
1991*0Sstevel@tonic-gatethat there's an undeclared identifier in those functions.)  No changes
1992*0Sstevel@tonic-gateare needed for the XSUBs themselves, because the XS() macro is
1993*0Sstevel@tonic-gatecorrectly defined to pass in the implicit context if needed.
1994*0Sstevel@tonic-gate
1995*0Sstevel@tonic-gateThe third, even more efficient way is to ape how it is done within
1996*0Sstevel@tonic-gatethe Perl guts:
1997*0Sstevel@tonic-gate
1998*0Sstevel@tonic-gate
1999*0Sstevel@tonic-gate        #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2000*0Sstevel@tonic-gate        #include "EXTERN.h"
2001*0Sstevel@tonic-gate        #include "perl.h"
2002*0Sstevel@tonic-gate        #include "XSUB.h"
2003*0Sstevel@tonic-gate
2004*0Sstevel@tonic-gate        /* pTHX_ only needed for functions that call Perl API */
2005*0Sstevel@tonic-gate        static my_private_function(pTHX_ int arg1, int arg2);
2006*0Sstevel@tonic-gate
2007*0Sstevel@tonic-gate        static SV *
2008*0Sstevel@tonic-gate        my_private_function(pTHX_ int arg1, int arg2)
2009*0Sstevel@tonic-gate        {
2010*0Sstevel@tonic-gate            /* dTHX; not needed here, because THX is an argument */
2011*0Sstevel@tonic-gate            ... call Perl API functions ...
2012*0Sstevel@tonic-gate        }
2013*0Sstevel@tonic-gate
2014*0Sstevel@tonic-gate        [... etc ...]
2015*0Sstevel@tonic-gate
2016*0Sstevel@tonic-gate        MODULE = Foo            PACKAGE = Foo
2017*0Sstevel@tonic-gate
2018*0Sstevel@tonic-gate        /* typical XSUB */
2019*0Sstevel@tonic-gate
2020*0Sstevel@tonic-gate        void
2021*0Sstevel@tonic-gate        my_xsub(arg)
2022*0Sstevel@tonic-gate                int arg
2023*0Sstevel@tonic-gate            CODE:
2024*0Sstevel@tonic-gate                my_private_function(aTHX_ arg, 10);
2025*0Sstevel@tonic-gate
2026*0Sstevel@tonic-gateThis implementation never has to fetch the context using a function
2027*0Sstevel@tonic-gatecall, since it is always passed as an extra argument.  Depending on
2028*0Sstevel@tonic-gateyour needs for simplicity or efficiency, you may mix the previous
2029*0Sstevel@tonic-gatetwo approaches freely.
2030*0Sstevel@tonic-gate
2031*0Sstevel@tonic-gateNever add a comma after C<pTHX> yourself--always use the form of the
2032*0Sstevel@tonic-gatemacro with the underscore for functions that take explicit arguments,
2033*0Sstevel@tonic-gateor the form without the argument for functions with no explicit arguments.
2034*0Sstevel@tonic-gate
2035*0Sstevel@tonic-gate=head2 Should I do anything special if I call perl from multiple threads?
2036*0Sstevel@tonic-gate
2037*0Sstevel@tonic-gateIf you create interpreters in one thread and then proceed to call them in
2038*0Sstevel@tonic-gateanother, you need to make sure perl's own Thread Local Storage (TLS) slot is
2039*0Sstevel@tonic-gateinitialized correctly in each of those threads.
2040*0Sstevel@tonic-gate
2041*0Sstevel@tonic-gateThe C<perl_alloc> and C<perl_clone> API functions will automatically set
2042*0Sstevel@tonic-gatethe TLS slot to the interpreter they created, so that there is no need to do
2043*0Sstevel@tonic-gateanything special if the interpreter is always accessed in the same thread that
2044*0Sstevel@tonic-gatecreated it, and that thread did not create or call any other interpreters
2045*0Sstevel@tonic-gateafterwards.  If that is not the case, you have to set the TLS slot of the
2046*0Sstevel@tonic-gatethread before calling any functions in the Perl API on that particular
2047*0Sstevel@tonic-gateinterpreter.  This is done by calling the C<PERL_SET_CONTEXT> macro in that
2048*0Sstevel@tonic-gatethread as the first thing you do:
2049*0Sstevel@tonic-gate
2050*0Sstevel@tonic-gate	/* do this before doing anything else with some_perl */
2051*0Sstevel@tonic-gate	PERL_SET_CONTEXT(some_perl);
2052*0Sstevel@tonic-gate
2053*0Sstevel@tonic-gate	... other Perl API calls on some_perl go here ...
2054*0Sstevel@tonic-gate
2055*0Sstevel@tonic-gate=head2 Future Plans and PERL_IMPLICIT_SYS
2056*0Sstevel@tonic-gate
2057*0Sstevel@tonic-gateJust as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2058*0Sstevel@tonic-gatethat the interpreter knows about itself and pass it around, so too are
2059*0Sstevel@tonic-gatethere plans to allow the interpreter to bundle up everything it knows
2060*0Sstevel@tonic-gateabout the environment it's running on.  This is enabled with the
2061*0Sstevel@tonic-gatePERL_IMPLICIT_SYS macro.  Currently it only works with USE_ITHREADS
2062*0Sstevel@tonic-gateand USE_5005THREADS on Windows (see inside iperlsys.h).
2063*0Sstevel@tonic-gate
2064*0Sstevel@tonic-gateThis allows the ability to provide an extra pointer (called the "host"
2065*0Sstevel@tonic-gateenvironment) for all the system calls.  This makes it possible for
2066*0Sstevel@tonic-gateall the system stuff to maintain their own state, broken down into
2067*0Sstevel@tonic-gateseven C structures.  These are thin wrappers around the usual system
2068*0Sstevel@tonic-gatecalls (see win32/perllib.c) for the default perl executable, but for a
2069*0Sstevel@tonic-gatemore ambitious host (like the one that would do fork() emulation) all
2070*0Sstevel@tonic-gatethe extra work needed to pretend that different interpreters are
2071*0Sstevel@tonic-gateactually different "processes", would be done here.
2072*0Sstevel@tonic-gate
2073*0Sstevel@tonic-gateThe Perl engine/interpreter and the host are orthogonal entities.
2074*0Sstevel@tonic-gateThere could be one or more interpreters in a process, and one or
2075*0Sstevel@tonic-gatemore "hosts", with free association between them.
2076*0Sstevel@tonic-gate
2077*0Sstevel@tonic-gate=head1 Internal Functions
2078*0Sstevel@tonic-gate
2079*0Sstevel@tonic-gateAll of Perl's internal functions which will be exposed to the outside
2080*0Sstevel@tonic-gateworld are prefixed by C<Perl_> so that they will not conflict with XS
2081*0Sstevel@tonic-gatefunctions or functions used in a program in which Perl is embedded.
2082*0Sstevel@tonic-gateSimilarly, all global variables begin with C<PL_>. (By convention,
2083*0Sstevel@tonic-gatestatic functions start with C<S_>.)
2084*0Sstevel@tonic-gate
2085*0Sstevel@tonic-gateInside the Perl core, you can get at the functions either with or
2086*0Sstevel@tonic-gatewithout the C<Perl_> prefix, thanks to a bunch of defines that live in
2087*0Sstevel@tonic-gateF<embed.h>. This header file is generated automatically from
2088*0Sstevel@tonic-gateF<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
2089*0Sstevel@tonic-gateheader files for the internal functions, generates the documentation
2090*0Sstevel@tonic-gateand a lot of other bits and pieces. It's important that when you add
2091*0Sstevel@tonic-gatea new function to the core or change an existing one, you change the
2092*0Sstevel@tonic-gatedata in the table in F<embed.fnc> as well. Here's a sample entry from
2093*0Sstevel@tonic-gatethat table:
2094*0Sstevel@tonic-gate
2095*0Sstevel@tonic-gate    Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval
2096*0Sstevel@tonic-gate
2097*0Sstevel@tonic-gateThe second column is the return type, the third column the name. Columns
2098*0Sstevel@tonic-gateafter that are the arguments. The first column is a set of flags:
2099*0Sstevel@tonic-gate
2100*0Sstevel@tonic-gate=over 3
2101*0Sstevel@tonic-gate
2102*0Sstevel@tonic-gate=item A
2103*0Sstevel@tonic-gate
2104*0Sstevel@tonic-gateThis function is a part of the public API.
2105*0Sstevel@tonic-gate
2106*0Sstevel@tonic-gate=item p
2107*0Sstevel@tonic-gate
2108*0Sstevel@tonic-gateThis function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>
2109*0Sstevel@tonic-gate
2110*0Sstevel@tonic-gate=item d
2111*0Sstevel@tonic-gate
2112*0Sstevel@tonic-gateThis function has documentation using the C<apidoc> feature which we'll
2113*0Sstevel@tonic-gatelook at in a second.
2114*0Sstevel@tonic-gate
2115*0Sstevel@tonic-gate=back
2116*0Sstevel@tonic-gate
2117*0Sstevel@tonic-gateOther available flags are:
2118*0Sstevel@tonic-gate
2119*0Sstevel@tonic-gate=over 3
2120*0Sstevel@tonic-gate
2121*0Sstevel@tonic-gate=item s
2122*0Sstevel@tonic-gate
2123*0Sstevel@tonic-gateThis is a static function and is defined as C<S_whatever>, and usually
2124*0Sstevel@tonic-gatecalled within the sources as C<whatever(...)>.
2125*0Sstevel@tonic-gate
2126*0Sstevel@tonic-gate=item n
2127*0Sstevel@tonic-gate
2128*0Sstevel@tonic-gateThis does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
2129*0Sstevel@tonic-gateL<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
2130*0Sstevel@tonic-gate
2131*0Sstevel@tonic-gate=item r
2132*0Sstevel@tonic-gate
2133*0Sstevel@tonic-gateThis function never returns; C<croak>, C<exit> and friends.
2134*0Sstevel@tonic-gate
2135*0Sstevel@tonic-gate=item f
2136*0Sstevel@tonic-gate
2137*0Sstevel@tonic-gateThis function takes a variable number of arguments, C<printf> style.
2138*0Sstevel@tonic-gateThe argument list should end with C<...>, like this:
2139*0Sstevel@tonic-gate
2140*0Sstevel@tonic-gate    Afprd   |void   |croak          |const char* pat|...
2141*0Sstevel@tonic-gate
2142*0Sstevel@tonic-gate=item M
2143*0Sstevel@tonic-gate
2144*0Sstevel@tonic-gateThis function is part of the experimental development API, and may change
2145*0Sstevel@tonic-gateor disappear without notice.
2146*0Sstevel@tonic-gate
2147*0Sstevel@tonic-gate=item o
2148*0Sstevel@tonic-gate
2149*0Sstevel@tonic-gateThis function should not have a compatibility macro to define, say,
2150*0Sstevel@tonic-gateC<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
2151*0Sstevel@tonic-gate
2152*0Sstevel@tonic-gate=item x
2153*0Sstevel@tonic-gate
2154*0Sstevel@tonic-gateThis function isn't exported out of the Perl core.
2155*0Sstevel@tonic-gate
2156*0Sstevel@tonic-gate=item m
2157*0Sstevel@tonic-gate
2158*0Sstevel@tonic-gateThis is implemented as a macro.
2159*0Sstevel@tonic-gate
2160*0Sstevel@tonic-gate=item X
2161*0Sstevel@tonic-gate
2162*0Sstevel@tonic-gateThis function is explicitly exported.
2163*0Sstevel@tonic-gate
2164*0Sstevel@tonic-gate=item E
2165*0Sstevel@tonic-gate
2166*0Sstevel@tonic-gateThis function is visible to extensions included in the Perl core.
2167*0Sstevel@tonic-gate
2168*0Sstevel@tonic-gate=item b
2169*0Sstevel@tonic-gate
2170*0Sstevel@tonic-gateBinary backward compatibility; this function is a macro but also has
2171*0Sstevel@tonic-gatea C<Perl_> implementation (which is exported).
2172*0Sstevel@tonic-gate
2173*0Sstevel@tonic-gate=back
2174*0Sstevel@tonic-gate
2175*0Sstevel@tonic-gateIf you edit F<embed.pl> or F<embed.fnc>, you will need to run
2176*0Sstevel@tonic-gateC<make regen_headers> to force a rebuild of F<embed.h> and other
2177*0Sstevel@tonic-gateauto-generated files.
2178*0Sstevel@tonic-gate
2179*0Sstevel@tonic-gate=head2 Formatted Printing of IVs, UVs, and NVs
2180*0Sstevel@tonic-gate
2181*0Sstevel@tonic-gateIf you are printing IVs, UVs, or NVS instead of the stdio(3) style
2182*0Sstevel@tonic-gateformatting codes like C<%d>, C<%ld>, C<%f>, you should use the
2183*0Sstevel@tonic-gatefollowing macros for portability
2184*0Sstevel@tonic-gate
2185*0Sstevel@tonic-gate        IVdf            IV in decimal
2186*0Sstevel@tonic-gate        UVuf            UV in decimal
2187*0Sstevel@tonic-gate        UVof            UV in octal
2188*0Sstevel@tonic-gate        UVxf            UV in hexadecimal
2189*0Sstevel@tonic-gate        NVef            NV %e-like
2190*0Sstevel@tonic-gate        NVff            NV %f-like
2191*0Sstevel@tonic-gate        NVgf            NV %g-like
2192*0Sstevel@tonic-gate
2193*0Sstevel@tonic-gateThese will take care of 64-bit integers and long doubles.
2194*0Sstevel@tonic-gateFor example:
2195*0Sstevel@tonic-gate
2196*0Sstevel@tonic-gate        printf("IV is %"IVdf"\n", iv);
2197*0Sstevel@tonic-gate
2198*0Sstevel@tonic-gateThe IVdf will expand to whatever is the correct format for the IVs.
2199*0Sstevel@tonic-gate
2200*0Sstevel@tonic-gateIf you are printing addresses of pointers, use UVxf combined
2201*0Sstevel@tonic-gatewith PTR2UV(), do not use %lx or %p.
2202*0Sstevel@tonic-gate
2203*0Sstevel@tonic-gate=head2 Pointer-To-Integer and Integer-To-Pointer
2204*0Sstevel@tonic-gate
2205*0Sstevel@tonic-gateBecause pointer size does not necessarily equal integer size,
2206*0Sstevel@tonic-gateuse the follow macros to do it right.
2207*0Sstevel@tonic-gate
2208*0Sstevel@tonic-gate        PTR2UV(pointer)
2209*0Sstevel@tonic-gate        PTR2IV(pointer)
2210*0Sstevel@tonic-gate        PTR2NV(pointer)
2211*0Sstevel@tonic-gate        INT2PTR(pointertotype, integer)
2212*0Sstevel@tonic-gate
2213*0Sstevel@tonic-gateFor example:
2214*0Sstevel@tonic-gate
2215*0Sstevel@tonic-gate        IV  iv = ...;
2216*0Sstevel@tonic-gate        SV *sv = INT2PTR(SV*, iv);
2217*0Sstevel@tonic-gate
2218*0Sstevel@tonic-gateand
2219*0Sstevel@tonic-gate
2220*0Sstevel@tonic-gate        AV *av = ...;
2221*0Sstevel@tonic-gate        UV  uv = PTR2UV(av);
2222*0Sstevel@tonic-gate
2223*0Sstevel@tonic-gate=head2 Source Documentation
2224*0Sstevel@tonic-gate
2225*0Sstevel@tonic-gateThere's an effort going on to document the internal functions and
2226*0Sstevel@tonic-gateautomatically produce reference manuals from them - L<perlapi> is one
2227*0Sstevel@tonic-gatesuch manual which details all the functions which are available to XS
2228*0Sstevel@tonic-gatewriters. L<perlintern> is the autogenerated manual for the functions
2229*0Sstevel@tonic-gatewhich are not part of the API and are supposedly for internal use only.
2230*0Sstevel@tonic-gate
2231*0Sstevel@tonic-gateSource documentation is created by putting POD comments into the C
2232*0Sstevel@tonic-gatesource, like this:
2233*0Sstevel@tonic-gate
2234*0Sstevel@tonic-gate /*
2235*0Sstevel@tonic-gate =for apidoc sv_setiv
2236*0Sstevel@tonic-gate
2237*0Sstevel@tonic-gate Copies an integer into the given SV.  Does not handle 'set' magic.  See
2238*0Sstevel@tonic-gate C<sv_setiv_mg>.
2239*0Sstevel@tonic-gate
2240*0Sstevel@tonic-gate =cut
2241*0Sstevel@tonic-gate */
2242*0Sstevel@tonic-gate
2243*0Sstevel@tonic-gatePlease try and supply some documentation if you add functions to the
2244*0Sstevel@tonic-gatePerl core.
2245*0Sstevel@tonic-gate
2246*0Sstevel@tonic-gate=head1 Unicode Support
2247*0Sstevel@tonic-gate
2248*0Sstevel@tonic-gatePerl 5.6.0 introduced Unicode support. It's important for porters and XS
2249*0Sstevel@tonic-gatewriters to understand this support and make sure that the code they
2250*0Sstevel@tonic-gatewrite does not corrupt Unicode data.
2251*0Sstevel@tonic-gate
2252*0Sstevel@tonic-gate=head2 What B<is> Unicode, anyway?
2253*0Sstevel@tonic-gate
2254*0Sstevel@tonic-gateIn the olden, less enlightened times, we all used to use ASCII. Most of
2255*0Sstevel@tonic-gateus did, anyway. The big problem with ASCII is that it's American. Well,
2256*0Sstevel@tonic-gateno, that's not actually the problem; the problem is that it's not
2257*0Sstevel@tonic-gateparticularly useful for people who don't use the Roman alphabet. What
2258*0Sstevel@tonic-gateused to happen was that particular languages would stick their own
2259*0Sstevel@tonic-gatealphabet in the upper range of the sequence, between 128 and 255. Of
2260*0Sstevel@tonic-gatecourse, we then ended up with plenty of variants that weren't quite
2261*0Sstevel@tonic-gateASCII, and the whole point of it being a standard was lost.
2262*0Sstevel@tonic-gate
2263*0Sstevel@tonic-gateWorse still, if you've got a language like Chinese or
2264*0Sstevel@tonic-gateJapanese that has hundreds or thousands of characters, then you really
2265*0Sstevel@tonic-gatecan't fit them into a mere 256, so they had to forget about ASCII
2266*0Sstevel@tonic-gatealtogether, and build their own systems using pairs of numbers to refer
2267*0Sstevel@tonic-gateto one character.
2268*0Sstevel@tonic-gate
2269*0Sstevel@tonic-gateTo fix this, some people formed Unicode, Inc. and
2270*0Sstevel@tonic-gateproduced a new character set containing all the characters you can
2271*0Sstevel@tonic-gatepossibly think of and more. There are several ways of representing these
2272*0Sstevel@tonic-gatecharacters, and the one Perl uses is called UTF-8. UTF-8 uses
2273*0Sstevel@tonic-gatea variable number of bytes to represent a character, instead of just
2274*0Sstevel@tonic-gateone. You can learn more about Unicode at http://www.unicode.org/
2275*0Sstevel@tonic-gate
2276*0Sstevel@tonic-gate=head2 How can I recognise a UTF-8 string?
2277*0Sstevel@tonic-gate
2278*0Sstevel@tonic-gateYou can't. This is because UTF-8 data is stored in bytes just like
2279*0Sstevel@tonic-gatenon-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
2280*0Sstevel@tonic-gatecapital E with a grave accent, is represented by the two bytes
2281*0Sstevel@tonic-gateC<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
2282*0Sstevel@tonic-gatehas that byte sequence as well. So you can't tell just by looking - this
2283*0Sstevel@tonic-gateis what makes Unicode input an interesting problem.
2284*0Sstevel@tonic-gate
2285*0Sstevel@tonic-gateThe API function C<is_utf8_string> can help; it'll tell you if a string
2286*0Sstevel@tonic-gatecontains only valid UTF-8 characters. However, it can't do the work for
2287*0Sstevel@tonic-gateyou. On a character-by-character basis, C<is_utf8_char> will tell you
2288*0Sstevel@tonic-gatewhether the current character in a string is valid UTF-8.
2289*0Sstevel@tonic-gate
2290*0Sstevel@tonic-gate=head2 How does UTF-8 represent Unicode characters?
2291*0Sstevel@tonic-gate
2292*0Sstevel@tonic-gateAs mentioned above, UTF-8 uses a variable number of bytes to store a
2293*0Sstevel@tonic-gatecharacter. Characters with values 1...128 are stored in one byte, just
2294*0Sstevel@tonic-gatelike good ol' ASCII. Character 129 is stored as C<v194.129>; this
2295*0Sstevel@tonic-gatecontinues up to character 191, which is C<v194.191>. Now we've run out of
2296*0Sstevel@tonic-gatebits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
2297*0Sstevel@tonic-gateso it goes on, moving to three bytes at character 2048.
2298*0Sstevel@tonic-gate
2299*0Sstevel@tonic-gateAssuming you know you're dealing with a UTF-8 string, you can find out
2300*0Sstevel@tonic-gatehow long the first character in it is with the C<UTF8SKIP> macro:
2301*0Sstevel@tonic-gate
2302*0Sstevel@tonic-gate    char *utf = "\305\233\340\240\201";
2303*0Sstevel@tonic-gate    I32 len;
2304*0Sstevel@tonic-gate
2305*0Sstevel@tonic-gate    len = UTF8SKIP(utf); /* len is 2 here */
2306*0Sstevel@tonic-gate    utf += len;
2307*0Sstevel@tonic-gate    len = UTF8SKIP(utf); /* len is 3 here */
2308*0Sstevel@tonic-gate
2309*0Sstevel@tonic-gateAnother way to skip over characters in a UTF-8 string is to use
2310*0Sstevel@tonic-gateC<utf8_hop>, which takes a string and a number of characters to skip
2311*0Sstevel@tonic-gateover. You're on your own about bounds checking, though, so don't use it
2312*0Sstevel@tonic-gatelightly.
2313*0Sstevel@tonic-gate
2314*0Sstevel@tonic-gateAll bytes in a multi-byte UTF-8 character will have the high bit set,
2315*0Sstevel@tonic-gateso you can test if you need to do something special with this
2316*0Sstevel@tonic-gatecharacter like this (the UTF8_IS_INVARIANT() is a macro that tests
2317*0Sstevel@tonic-gatewhether the byte can be encoded as a single byte even in UTF-8):
2318*0Sstevel@tonic-gate
2319*0Sstevel@tonic-gate    U8 *utf;
2320*0Sstevel@tonic-gate    UV uv;	/* Note: a UV, not a U8, not a char */
2321*0Sstevel@tonic-gate
2322*0Sstevel@tonic-gate    if (!UTF8_IS_INVARIANT(*utf))
2323*0Sstevel@tonic-gate        /* Must treat this as UTF-8 */
2324*0Sstevel@tonic-gate        uv = utf8_to_uv(utf);
2325*0Sstevel@tonic-gate    else
2326*0Sstevel@tonic-gate        /* OK to treat this character as a byte */
2327*0Sstevel@tonic-gate        uv = *utf;
2328*0Sstevel@tonic-gate
2329*0Sstevel@tonic-gateYou can also see in that example that we use C<utf8_to_uv> to get the
2330*0Sstevel@tonic-gatevalue of the character; the inverse function C<uv_to_utf8> is available
2331*0Sstevel@tonic-gatefor putting a UV into UTF-8:
2332*0Sstevel@tonic-gate
2333*0Sstevel@tonic-gate    if (!UTF8_IS_INVARIANT(uv))
2334*0Sstevel@tonic-gate        /* Must treat this as UTF8 */
2335*0Sstevel@tonic-gate        utf8 = uv_to_utf8(utf8, uv);
2336*0Sstevel@tonic-gate    else
2337*0Sstevel@tonic-gate        /* OK to treat this character as a byte */
2338*0Sstevel@tonic-gate        *utf8++ = uv;
2339*0Sstevel@tonic-gate
2340*0Sstevel@tonic-gateYou B<must> convert characters to UVs using the above functions if
2341*0Sstevel@tonic-gateyou're ever in a situation where you have to match UTF-8 and non-UTF-8
2342*0Sstevel@tonic-gatecharacters. You may not skip over UTF-8 characters in this case. If you
2343*0Sstevel@tonic-gatedo this, you'll lose the ability to match hi-bit non-UTF-8 characters;
2344*0Sstevel@tonic-gatefor instance, if your UTF-8 string contains C<v196.172>, and you skip
2345*0Sstevel@tonic-gatethat character, you can never match a C<chr(200)> in a non-UTF-8 string.
2346*0Sstevel@tonic-gateSo don't do that!
2347*0Sstevel@tonic-gate
2348*0Sstevel@tonic-gate=head2 How does Perl store UTF-8 strings?
2349*0Sstevel@tonic-gate
2350*0Sstevel@tonic-gateCurrently, Perl deals with Unicode strings and non-Unicode strings
2351*0Sstevel@tonic-gateslightly differently. If a string has been identified as being UTF-8
2352*0Sstevel@tonic-gateencoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
2353*0Sstevel@tonic-gatemanipulate this flag with the following macros:
2354*0Sstevel@tonic-gate
2355*0Sstevel@tonic-gate    SvUTF8(sv)
2356*0Sstevel@tonic-gate    SvUTF8_on(sv)
2357*0Sstevel@tonic-gate    SvUTF8_off(sv)
2358*0Sstevel@tonic-gate
2359*0Sstevel@tonic-gateThis flag has an important effect on Perl's treatment of the string: if
2360*0Sstevel@tonic-gateUnicode data is not properly distinguished, regular expressions,
2361*0Sstevel@tonic-gateC<length>, C<substr> and other string handling operations will have
2362*0Sstevel@tonic-gateundesirable results.
2363*0Sstevel@tonic-gate
2364*0Sstevel@tonic-gateThe problem comes when you have, for instance, a string that isn't
2365*0Sstevel@tonic-gateflagged is UTF-8, and contains a byte sequence that could be UTF-8 -
2366*0Sstevel@tonic-gateespecially when combining non-UTF-8 and UTF-8 strings.
2367*0Sstevel@tonic-gate
2368*0Sstevel@tonic-gateNever forget that the C<SVf_UTF8> flag is separate to the PV value; you
2369*0Sstevel@tonic-gateneed be sure you don't accidentally knock it off while you're
2370*0Sstevel@tonic-gatemanipulating SVs. More specifically, you cannot expect to do this:
2371*0Sstevel@tonic-gate
2372*0Sstevel@tonic-gate    SV *sv;
2373*0Sstevel@tonic-gate    SV *nsv;
2374*0Sstevel@tonic-gate    STRLEN len;
2375*0Sstevel@tonic-gate    char *p;
2376*0Sstevel@tonic-gate
2377*0Sstevel@tonic-gate    p = SvPV(sv, len);
2378*0Sstevel@tonic-gate    frobnicate(p);
2379*0Sstevel@tonic-gate    nsv = newSVpvn(p, len);
2380*0Sstevel@tonic-gate
2381*0Sstevel@tonic-gateThe C<char*> string does not tell you the whole story, and you can't
2382*0Sstevel@tonic-gatecopy or reconstruct an SV just by copying the string value. Check if the
2383*0Sstevel@tonic-gateold SV has the UTF-8 flag set, and act accordingly:
2384*0Sstevel@tonic-gate
2385*0Sstevel@tonic-gate    p = SvPV(sv, len);
2386*0Sstevel@tonic-gate    frobnicate(p);
2387*0Sstevel@tonic-gate    nsv = newSVpvn(p, len);
2388*0Sstevel@tonic-gate    if (SvUTF8(sv))
2389*0Sstevel@tonic-gate        SvUTF8_on(nsv);
2390*0Sstevel@tonic-gate
2391*0Sstevel@tonic-gateIn fact, your C<frobnicate> function should be made aware of whether or
2392*0Sstevel@tonic-gatenot it's dealing with UTF-8 data, so that it can handle the string
2393*0Sstevel@tonic-gateappropriately.
2394*0Sstevel@tonic-gate
2395*0Sstevel@tonic-gateSince just passing an SV to an XS function and copying the data of
2396*0Sstevel@tonic-gatethe SV is not enough to copy the UTF-8 flags, even less right is just
2397*0Sstevel@tonic-gatepassing a C<char *> to an XS function.
2398*0Sstevel@tonic-gate
2399*0Sstevel@tonic-gate=head2 How do I convert a string to UTF-8?
2400*0Sstevel@tonic-gate
2401*0Sstevel@tonic-gateIf you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary
2402*0Sstevel@tonic-gateto upgrade one of the strings to UTF-8. If you've got an SV, the easiest
2403*0Sstevel@tonic-gateway to do this is:
2404*0Sstevel@tonic-gate
2405*0Sstevel@tonic-gate    sv_utf8_upgrade(sv);
2406*0Sstevel@tonic-gate
2407*0Sstevel@tonic-gateHowever, you must not do this, for example:
2408*0Sstevel@tonic-gate
2409*0Sstevel@tonic-gate    if (!SvUTF8(left))
2410*0Sstevel@tonic-gate        sv_utf8_upgrade(left);
2411*0Sstevel@tonic-gate
2412*0Sstevel@tonic-gateIf you do this in a binary operator, you will actually change one of the
2413*0Sstevel@tonic-gatestrings that came into the operator, and, while it shouldn't be noticeable
2414*0Sstevel@tonic-gateby the end user, it can cause problems.
2415*0Sstevel@tonic-gate
2416*0Sstevel@tonic-gateInstead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
2417*0Sstevel@tonic-gatestring argument. This is useful for having the data available for
2418*0Sstevel@tonic-gatecomparisons and so on, without harming the original SV. There's also
2419*0Sstevel@tonic-gateC<utf8_to_bytes> to go the other way, but naturally, this will fail if
2420*0Sstevel@tonic-gatethe string contains any characters above 255 that can't be represented
2421*0Sstevel@tonic-gatein a single byte.
2422*0Sstevel@tonic-gate
2423*0Sstevel@tonic-gate=head2 Is there anything else I need to know?
2424*0Sstevel@tonic-gate
2425*0Sstevel@tonic-gateNot really. Just remember these things:
2426*0Sstevel@tonic-gate
2427*0Sstevel@tonic-gate=over 3
2428*0Sstevel@tonic-gate
2429*0Sstevel@tonic-gate=item *
2430*0Sstevel@tonic-gate
2431*0Sstevel@tonic-gateThere's no way to tell if a string is UTF-8 or not. You can tell if an SV
2432*0Sstevel@tonic-gateis UTF-8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
2433*0Sstevel@tonic-gatesomething should be UTF-8. Treat the flag as part of the PV, even though
2434*0Sstevel@tonic-gateit's not - if you pass on the PV to somewhere, pass on the flag too.
2435*0Sstevel@tonic-gate
2436*0Sstevel@tonic-gate=item *
2437*0Sstevel@tonic-gate
2438*0Sstevel@tonic-gateIf a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value,
2439*0Sstevel@tonic-gateunless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.
2440*0Sstevel@tonic-gate
2441*0Sstevel@tonic-gate=item *
2442*0Sstevel@tonic-gate
2443*0Sstevel@tonic-gateWhen writing a character C<uv> to a UTF-8 string, B<always> use
2444*0Sstevel@tonic-gateC<uv_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case
2445*0Sstevel@tonic-gateyou can use C<*s = uv>.
2446*0Sstevel@tonic-gate
2447*0Sstevel@tonic-gate=item *
2448*0Sstevel@tonic-gate
2449*0Sstevel@tonic-gateMixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get
2450*0Sstevel@tonic-gatea new string which is UTF-8 encoded. There are tricks you can use to
2451*0Sstevel@tonic-gatedelay deciding whether you need to use a UTF-8 string until you get to a
2452*0Sstevel@tonic-gatehigh character - C<HALF_UPGRADE> is one of those.
2453*0Sstevel@tonic-gate
2454*0Sstevel@tonic-gate=back
2455*0Sstevel@tonic-gate
2456*0Sstevel@tonic-gate=head1 Custom Operators
2457*0Sstevel@tonic-gate
2458*0Sstevel@tonic-gateCustom operator support is a new experimental feature that allows you to
2459*0Sstevel@tonic-gatedefine your own ops. This is primarily to allow the building of
2460*0Sstevel@tonic-gateinterpreters for other languages in the Perl core, but it also allows
2461*0Sstevel@tonic-gateoptimizations through the creation of "macro-ops" (ops which perform the
2462*0Sstevel@tonic-gatefunctions of multiple ops which are usually executed together, such as
2463*0Sstevel@tonic-gateC<gvsv, gvsv, add>.)
2464*0Sstevel@tonic-gate
2465*0Sstevel@tonic-gateThis feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
2466*0Sstevel@tonic-gatecore does not "know" anything special about this op type, and so it will
2467*0Sstevel@tonic-gatenot be involved in any optimizations. This also means that you can
2468*0Sstevel@tonic-gatedefine your custom ops to be any op structure - unary, binary, list and
2469*0Sstevel@tonic-gateso on - you like.
2470*0Sstevel@tonic-gate
2471*0Sstevel@tonic-gateIt's important to know what custom operators won't do for you. They
2472*0Sstevel@tonic-gatewon't let you add new syntax to Perl, directly. They won't even let you
2473*0Sstevel@tonic-gateadd new keywords, directly. In fact, they won't change the way Perl
2474*0Sstevel@tonic-gatecompiles a program at all. You have to do those changes yourself, after
2475*0Sstevel@tonic-gatePerl has compiled the program. You do this either by manipulating the op
2476*0Sstevel@tonic-gatetree using a C<CHECK> block and the C<B::Generate> module, or by adding
2477*0Sstevel@tonic-gatea custom peephole optimizer with the C<optimize> module.
2478*0Sstevel@tonic-gate
2479*0Sstevel@tonic-gateWhen you do this, you replace ordinary Perl ops with custom ops by
2480*0Sstevel@tonic-gatecreating ops with the type C<OP_CUSTOM> and the C<pp_addr> of your own
2481*0Sstevel@tonic-gatePP function. This should be defined in XS code, and should look like
2482*0Sstevel@tonic-gatethe PP ops in C<pp_*.c>. You are responsible for ensuring that your op
2483*0Sstevel@tonic-gatetakes the appropriate number of values from the stack, and you are
2484*0Sstevel@tonic-gateresponsible for adding stack marks if necessary.
2485*0Sstevel@tonic-gate
2486*0Sstevel@tonic-gateYou should also "register" your op with the Perl interpreter so that it
2487*0Sstevel@tonic-gatecan produce sensible error and warning messages. Since it is possible to
2488*0Sstevel@tonic-gatehave multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
2489*0Sstevel@tonic-gatePerl uses the value of C<< o->op_ppaddr >> as a key into the
2490*0Sstevel@tonic-gateC<PL_custom_op_descs> and C<PL_custom_op_names> hashes. This means you
2491*0Sstevel@tonic-gateneed to enter a name and description for your op at the appropriate
2492*0Sstevel@tonic-gateplace in the C<PL_custom_op_names> and C<PL_custom_op_descs> hashes.
2493*0Sstevel@tonic-gate
2494*0Sstevel@tonic-gateForthcoming versions of C<B::Generate> (version 1.0 and above) should
2495*0Sstevel@tonic-gatedirectly support the creation of custom ops by name; C<Opcodes::Custom>
2496*0Sstevel@tonic-gatewill provide functions which make it trivial to "register" custom ops to
2497*0Sstevel@tonic-gatethe Perl interpreter.
2498*0Sstevel@tonic-gate
2499*0Sstevel@tonic-gate=head1 AUTHORS
2500*0Sstevel@tonic-gate
2501*0Sstevel@tonic-gateUntil May 1997, this document was maintained by Jeff Okamoto
2502*0Sstevel@tonic-gateE<lt>okamoto@corp.hp.comE<gt>.  It is now maintained as part of Perl
2503*0Sstevel@tonic-gateitself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>.
2504*0Sstevel@tonic-gate
2505*0Sstevel@tonic-gateWith lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2506*0Sstevel@tonic-gateAndreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
2507*0Sstevel@tonic-gateBowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
2508*0Sstevel@tonic-gateStephen McCamant, and Gurusamy Sarathy.
2509*0Sstevel@tonic-gate
2510*0Sstevel@tonic-gate=head1 SEE ALSO
2511*0Sstevel@tonic-gate
2512*0Sstevel@tonic-gateperlapi(1), perlintern(1), perlxs(1), perlembed(1)
2513