1*0Sstevel@tonic-gate=head1 NAME 2*0Sstevel@tonic-gate 3*0Sstevel@tonic-gateperlguts - Introduction to the Perl API 4*0Sstevel@tonic-gate 5*0Sstevel@tonic-gate=head1 DESCRIPTION 6*0Sstevel@tonic-gate 7*0Sstevel@tonic-gateThis document attempts to describe how to use the Perl API, as well as 8*0Sstevel@tonic-gateto provide some info on the basic workings of the Perl core. It is far 9*0Sstevel@tonic-gatefrom complete and probably contains many errors. Please refer any 10*0Sstevel@tonic-gatequestions or comments to the author below. 11*0Sstevel@tonic-gate 12*0Sstevel@tonic-gate=head1 Variables 13*0Sstevel@tonic-gate 14*0Sstevel@tonic-gate=head2 Datatypes 15*0Sstevel@tonic-gate 16*0Sstevel@tonic-gatePerl has three typedefs that handle Perl's three main data types: 17*0Sstevel@tonic-gate 18*0Sstevel@tonic-gate SV Scalar Value 19*0Sstevel@tonic-gate AV Array Value 20*0Sstevel@tonic-gate HV Hash Value 21*0Sstevel@tonic-gate 22*0Sstevel@tonic-gateEach typedef has specific routines that manipulate the various data types. 23*0Sstevel@tonic-gate 24*0Sstevel@tonic-gate=head2 What is an "IV"? 25*0Sstevel@tonic-gate 26*0Sstevel@tonic-gatePerl uses a special typedef IV which is a simple signed integer type that is 27*0Sstevel@tonic-gateguaranteed to be large enough to hold a pointer (as well as an integer). 28*0Sstevel@tonic-gateAdditionally, there is the UV, which is simply an unsigned IV. 29*0Sstevel@tonic-gate 30*0Sstevel@tonic-gatePerl also uses two special typedefs, I32 and I16, which will always be at 31*0Sstevel@tonic-gateleast 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, 32*0Sstevel@tonic-gateas well.) They will usually be exactly 32 and 16 bits long, but on Crays 33*0Sstevel@tonic-gatethey will both be 64 bits. 34*0Sstevel@tonic-gate 35*0Sstevel@tonic-gate=head2 Working with SVs 36*0Sstevel@tonic-gate 37*0Sstevel@tonic-gateAn SV can be created and loaded with one command. There are five types of 38*0Sstevel@tonic-gatevalues that can be loaded: an integer value (IV), an unsigned integer 39*0Sstevel@tonic-gatevalue (UV), a double (NV), a string (PV), and another scalar (SV). 40*0Sstevel@tonic-gate 41*0Sstevel@tonic-gateThe seven routines are: 42*0Sstevel@tonic-gate 43*0Sstevel@tonic-gate SV* newSViv(IV); 44*0Sstevel@tonic-gate SV* newSVuv(UV); 45*0Sstevel@tonic-gate SV* newSVnv(double); 46*0Sstevel@tonic-gate SV* newSVpv(const char*, STRLEN); 47*0Sstevel@tonic-gate SV* newSVpvn(const char*, STRLEN); 48*0Sstevel@tonic-gate SV* newSVpvf(const char*, ...); 49*0Sstevel@tonic-gate SV* newSVsv(SV*); 50*0Sstevel@tonic-gate 51*0Sstevel@tonic-gateC<STRLEN> is an integer type (Size_t, usually defined as size_t in 52*0Sstevel@tonic-gateF<config.h>) guaranteed to be large enough to represent the size of 53*0Sstevel@tonic-gateany string that perl can handle. 54*0Sstevel@tonic-gate 55*0Sstevel@tonic-gateIn the unlikely case of a SV requiring more complex initialisation, you 56*0Sstevel@tonic-gatecan create an empty SV with newSV(len). If C<len> is 0 an empty SV of 57*0Sstevel@tonic-gatetype NULL is returned, else an SV of type PV is returned with len + 1 (for 58*0Sstevel@tonic-gatethe NUL) bytes of storage allocated, accessible via SvPVX. In both cases 59*0Sstevel@tonic-gatethe SV has value undef. 60*0Sstevel@tonic-gate 61*0Sstevel@tonic-gate SV *sv = newSV(0); /* no storage allocated */ 62*0Sstevel@tonic-gate SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */ 63*0Sstevel@tonic-gate 64*0Sstevel@tonic-gateTo change the value of an I<already-existing> SV, there are eight routines: 65*0Sstevel@tonic-gate 66*0Sstevel@tonic-gate void sv_setiv(SV*, IV); 67*0Sstevel@tonic-gate void sv_setuv(SV*, UV); 68*0Sstevel@tonic-gate void sv_setnv(SV*, double); 69*0Sstevel@tonic-gate void sv_setpv(SV*, const char*); 70*0Sstevel@tonic-gate void sv_setpvn(SV*, const char*, STRLEN) 71*0Sstevel@tonic-gate void sv_setpvf(SV*, const char*, ...); 72*0Sstevel@tonic-gate void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *); 73*0Sstevel@tonic-gate void sv_setsv(SV*, SV*); 74*0Sstevel@tonic-gate 75*0Sstevel@tonic-gateNotice that you can choose to specify the length of the string to be 76*0Sstevel@tonic-gateassigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 77*0Sstevel@tonic-gateallow Perl to calculate the length by using C<sv_setpv> or by specifying 78*0Sstevel@tonic-gate0 as the second argument to C<newSVpv>. Be warned, though, that Perl will 79*0Sstevel@tonic-gatedetermine the string's length by using C<strlen>, which depends on the 80*0Sstevel@tonic-gatestring terminating with a NUL character. 81*0Sstevel@tonic-gate 82*0Sstevel@tonic-gateThe arguments of C<sv_setpvf> are processed like C<sprintf>, and the 83*0Sstevel@tonic-gateformatted output becomes the value. 84*0Sstevel@tonic-gate 85*0Sstevel@tonic-gateC<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 86*0Sstevel@tonic-gateeither a pointer to a variable argument list or the address and length of 87*0Sstevel@tonic-gatean array of SVs. The last argument points to a boolean; on return, if that 88*0Sstevel@tonic-gateboolean is true, then locale-specific information has been used to format 89*0Sstevel@tonic-gatethe string, and the string's contents are therefore untrustworthy (see 90*0Sstevel@tonic-gateL<perlsec>). This pointer may be NULL if that information is not 91*0Sstevel@tonic-gateimportant. Note that this function requires you to specify the length of 92*0Sstevel@tonic-gatethe format. 93*0Sstevel@tonic-gate 94*0Sstevel@tonic-gateThe C<sv_set*()> functions are not generic enough to operate on values 95*0Sstevel@tonic-gatethat have "magic". See L<Magic Virtual Tables> later in this document. 96*0Sstevel@tonic-gate 97*0Sstevel@tonic-gateAll SVs that contain strings should be terminated with a NUL character. 98*0Sstevel@tonic-gateIf it is not NUL-terminated there is a risk of 99*0Sstevel@tonic-gatecore dumps and corruptions from code which passes the string to C 100*0Sstevel@tonic-gatefunctions or system calls which expect a NUL-terminated string. 101*0Sstevel@tonic-gatePerl's own functions typically add a trailing NUL for this reason. 102*0Sstevel@tonic-gateNevertheless, you should be very careful when you pass a string stored 103*0Sstevel@tonic-gatein an SV to a C function or system call. 104*0Sstevel@tonic-gate 105*0Sstevel@tonic-gateTo access the actual value that an SV points to, you can use the macros: 106*0Sstevel@tonic-gate 107*0Sstevel@tonic-gate SvIV(SV*) 108*0Sstevel@tonic-gate SvUV(SV*) 109*0Sstevel@tonic-gate SvNV(SV*) 110*0Sstevel@tonic-gate SvPV(SV*, STRLEN len) 111*0Sstevel@tonic-gate SvPV_nolen(SV*) 112*0Sstevel@tonic-gate 113*0Sstevel@tonic-gatewhich will automatically coerce the actual scalar type into an IV, UV, double, 114*0Sstevel@tonic-gateor string. 115*0Sstevel@tonic-gate 116*0Sstevel@tonic-gateIn the C<SvPV> macro, the length of the string returned is placed into the 117*0Sstevel@tonic-gatevariable C<len> (this is a macro, so you do I<not> use C<&len>). If you do 118*0Sstevel@tonic-gatenot care what the length of the data is, use the C<SvPV_nolen> macro. 119*0Sstevel@tonic-gateHistorically the C<SvPV> macro with the global variable C<PL_na> has been 120*0Sstevel@tonic-gateused in this case. But that can be quite inefficient because C<PL_na> must 121*0Sstevel@tonic-gatebe accessed in thread-local storage in threaded Perl. In any case, remember 122*0Sstevel@tonic-gatethat Perl allows arbitrary strings of data that may both contain NULs and 123*0Sstevel@tonic-gatemight not be terminated by a NUL. 124*0Sstevel@tonic-gate 125*0Sstevel@tonic-gateAlso remember that C doesn't allow you to safely say C<foo(SvPV(s, len), 126*0Sstevel@tonic-gatelen);>. It might work with your compiler, but it won't work for everyone. 127*0Sstevel@tonic-gateBreak this sort of statement up into separate assignments: 128*0Sstevel@tonic-gate 129*0Sstevel@tonic-gate SV *s; 130*0Sstevel@tonic-gate STRLEN len; 131*0Sstevel@tonic-gate char * ptr; 132*0Sstevel@tonic-gate ptr = SvPV(s, len); 133*0Sstevel@tonic-gate foo(ptr, len); 134*0Sstevel@tonic-gate 135*0Sstevel@tonic-gateIf you want to know if the scalar value is TRUE, you can use: 136*0Sstevel@tonic-gate 137*0Sstevel@tonic-gate SvTRUE(SV*) 138*0Sstevel@tonic-gate 139*0Sstevel@tonic-gateAlthough Perl will automatically grow strings for you, if you need to force 140*0Sstevel@tonic-gatePerl to allocate more memory for your SV, you can use the macro 141*0Sstevel@tonic-gate 142*0Sstevel@tonic-gate SvGROW(SV*, STRLEN newlen) 143*0Sstevel@tonic-gate 144*0Sstevel@tonic-gatewhich will determine if more memory needs to be allocated. If so, it will 145*0Sstevel@tonic-gatecall the function C<sv_grow>. Note that C<SvGROW> can only increase, not 146*0Sstevel@tonic-gatedecrease, the allocated memory of an SV and that it does not automatically 147*0Sstevel@tonic-gateadd a byte for the a trailing NUL (perl's own string functions typically do 148*0Sstevel@tonic-gateC<SvGROW(sv, len + 1)>). 149*0Sstevel@tonic-gate 150*0Sstevel@tonic-gateIf you have an SV and want to know what kind of data Perl thinks is stored 151*0Sstevel@tonic-gatein it, you can use the following macros to check the type of SV you have. 152*0Sstevel@tonic-gate 153*0Sstevel@tonic-gate SvIOK(SV*) 154*0Sstevel@tonic-gate SvNOK(SV*) 155*0Sstevel@tonic-gate SvPOK(SV*) 156*0Sstevel@tonic-gate 157*0Sstevel@tonic-gateYou can get and set the current length of the string stored in an SV with 158*0Sstevel@tonic-gatethe following macros: 159*0Sstevel@tonic-gate 160*0Sstevel@tonic-gate SvCUR(SV*) 161*0Sstevel@tonic-gate SvCUR_set(SV*, I32 val) 162*0Sstevel@tonic-gate 163*0Sstevel@tonic-gateYou can also get a pointer to the end of the string stored in the SV 164*0Sstevel@tonic-gatewith the macro: 165*0Sstevel@tonic-gate 166*0Sstevel@tonic-gate SvEND(SV*) 167*0Sstevel@tonic-gate 168*0Sstevel@tonic-gateBut note that these last three macros are valid only if C<SvPOK()> is true. 169*0Sstevel@tonic-gate 170*0Sstevel@tonic-gateIf you want to append something to the end of string stored in an C<SV*>, 171*0Sstevel@tonic-gateyou can use the following functions: 172*0Sstevel@tonic-gate 173*0Sstevel@tonic-gate void sv_catpv(SV*, const char*); 174*0Sstevel@tonic-gate void sv_catpvn(SV*, const char*, STRLEN); 175*0Sstevel@tonic-gate void sv_catpvf(SV*, const char*, ...); 176*0Sstevel@tonic-gate void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); 177*0Sstevel@tonic-gate void sv_catsv(SV*, SV*); 178*0Sstevel@tonic-gate 179*0Sstevel@tonic-gateThe first function calculates the length of the string to be appended by 180*0Sstevel@tonic-gateusing C<strlen>. In the second, you specify the length of the string 181*0Sstevel@tonic-gateyourself. The third function processes its arguments like C<sprintf> and 182*0Sstevel@tonic-gateappends the formatted output. The fourth function works like C<vsprintf>. 183*0Sstevel@tonic-gateYou can specify the address and length of an array of SVs instead of the 184*0Sstevel@tonic-gateva_list argument. The fifth function extends the string stored in the first 185*0Sstevel@tonic-gateSV with the string stored in the second SV. It also forces the second SV 186*0Sstevel@tonic-gateto be interpreted as a string. 187*0Sstevel@tonic-gate 188*0Sstevel@tonic-gateThe C<sv_cat*()> functions are not generic enough to operate on values that 189*0Sstevel@tonic-gatehave "magic". See L<Magic Virtual Tables> later in this document. 190*0Sstevel@tonic-gate 191*0Sstevel@tonic-gateIf you know the name of a scalar variable, you can get a pointer to its SV 192*0Sstevel@tonic-gateby using the following: 193*0Sstevel@tonic-gate 194*0Sstevel@tonic-gate SV* get_sv("package::varname", FALSE); 195*0Sstevel@tonic-gate 196*0Sstevel@tonic-gateThis returns NULL if the variable does not exist. 197*0Sstevel@tonic-gate 198*0Sstevel@tonic-gateIf you want to know if this variable (or any other SV) is actually C<defined>, 199*0Sstevel@tonic-gateyou can call: 200*0Sstevel@tonic-gate 201*0Sstevel@tonic-gate SvOK(SV*) 202*0Sstevel@tonic-gate 203*0Sstevel@tonic-gateThe scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 204*0Sstevel@tonic-gateIts address can be used whenever an C<SV*> is needed. 205*0Sstevel@tonic-gateHowever, you have to be careful when using C<&PL_sv_undef> as a value in AVs 206*0Sstevel@tonic-gateor HVs (see L<AVs, HVs and undefined values>). 207*0Sstevel@tonic-gate 208*0Sstevel@tonic-gateThere are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 209*0Sstevel@tonic-gateboolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 210*0Sstevel@tonic-gateaddresses can be used whenever an C<SV*> is needed. 211*0Sstevel@tonic-gate 212*0Sstevel@tonic-gateDo not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 213*0Sstevel@tonic-gateTake this code: 214*0Sstevel@tonic-gate 215*0Sstevel@tonic-gate SV* sv = (SV*) 0; 216*0Sstevel@tonic-gate if (I-am-to-return-a-real-value) { 217*0Sstevel@tonic-gate sv = sv_2mortal(newSViv(42)); 218*0Sstevel@tonic-gate } 219*0Sstevel@tonic-gate sv_setsv(ST(0), sv); 220*0Sstevel@tonic-gate 221*0Sstevel@tonic-gateThis code tries to return a new SV (which contains the value 42) if it should 222*0Sstevel@tonic-gatereturn a real value, or undef otherwise. Instead it has returned a NULL 223*0Sstevel@tonic-gatepointer which, somewhere down the line, will cause a segmentation violation, 224*0Sstevel@tonic-gatebus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 225*0Sstevel@tonic-gatefirst line and all will be well. 226*0Sstevel@tonic-gate 227*0Sstevel@tonic-gateTo free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 228*0Sstevel@tonic-gatecall is not necessary (see L<Reference Counts and Mortality>). 229*0Sstevel@tonic-gate 230*0Sstevel@tonic-gate=head2 Offsets 231*0Sstevel@tonic-gate 232*0Sstevel@tonic-gatePerl provides the function C<sv_chop> to efficiently remove characters 233*0Sstevel@tonic-gatefrom the beginning of a string; you give it an SV and a pointer to 234*0Sstevel@tonic-gatesomewhere inside the PV, and it discards everything before the 235*0Sstevel@tonic-gatepointer. The efficiency comes by means of a little hack: instead of 236*0Sstevel@tonic-gateactually removing the characters, C<sv_chop> sets the flag C<OOK> 237*0Sstevel@tonic-gate(offset OK) to signal to other functions that the offset hack is in 238*0Sstevel@tonic-gateeffect, and it puts the number of bytes chopped off into the IV field 239*0Sstevel@tonic-gateof the SV. It then moves the PV pointer (called C<SvPVX>) forward that 240*0Sstevel@tonic-gatemany bytes, and adjusts C<SvCUR> and C<SvLEN>. 241*0Sstevel@tonic-gate 242*0Sstevel@tonic-gateHence, at this point, the start of the buffer that we allocated lives 243*0Sstevel@tonic-gateat C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 244*0Sstevel@tonic-gateinto the middle of this allocated storage. 245*0Sstevel@tonic-gate 246*0Sstevel@tonic-gateThis is best demonstrated by example: 247*0Sstevel@tonic-gate 248*0Sstevel@tonic-gate % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' 249*0Sstevel@tonic-gate SV = PVIV(0x8128450) at 0x81340f0 250*0Sstevel@tonic-gate REFCNT = 1 251*0Sstevel@tonic-gate FLAGS = (POK,OOK,pPOK) 252*0Sstevel@tonic-gate IV = 1 (OFFSET) 253*0Sstevel@tonic-gate PV = 0x8135781 ( "1" . ) "2345"\0 254*0Sstevel@tonic-gate CUR = 4 255*0Sstevel@tonic-gate LEN = 5 256*0Sstevel@tonic-gate 257*0Sstevel@tonic-gateHere the number of bytes chopped off (1) is put into IV, and 258*0Sstevel@tonic-gateC<Devel::Peek::Dump> helpfully reminds us that this is an offset. The 259*0Sstevel@tonic-gateportion of the string between the "real" and the "fake" beginnings is 260*0Sstevel@tonic-gateshown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 261*0Sstevel@tonic-gatethe fake beginning, not the real one. 262*0Sstevel@tonic-gate 263*0Sstevel@tonic-gateSomething similar to the offset hack is performed on AVs to enable 264*0Sstevel@tonic-gateefficient shifting and splicing off the beginning of the array; while 265*0Sstevel@tonic-gateC<AvARRAY> points to the first element in the array that is visible from 266*0Sstevel@tonic-gatePerl, C<AvALLOC> points to the real start of the C array. These are 267*0Sstevel@tonic-gateusually the same, but a C<shift> operation can be carried out by 268*0Sstevel@tonic-gateincreasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>. 269*0Sstevel@tonic-gateAgain, the location of the real start of the C array only comes into 270*0Sstevel@tonic-gateplay when freeing the array. See C<av_shift> in F<av.c>. 271*0Sstevel@tonic-gate 272*0Sstevel@tonic-gate=head2 What's Really Stored in an SV? 273*0Sstevel@tonic-gate 274*0Sstevel@tonic-gateRecall that the usual method of determining the type of scalar you have is 275*0Sstevel@tonic-gateto use C<Sv*OK> macros. Because a scalar can be both a number and a string, 276*0Sstevel@tonic-gateusually these macros will always return TRUE and calling the C<Sv*V> 277*0Sstevel@tonic-gatemacros will do the appropriate conversion of string to integer/double or 278*0Sstevel@tonic-gateinteger/double to string. 279*0Sstevel@tonic-gate 280*0Sstevel@tonic-gateIf you I<really> need to know if you have an integer, double, or string 281*0Sstevel@tonic-gatepointer in an SV, you can use the following three macros instead: 282*0Sstevel@tonic-gate 283*0Sstevel@tonic-gate SvIOKp(SV*) 284*0Sstevel@tonic-gate SvNOKp(SV*) 285*0Sstevel@tonic-gate SvPOKp(SV*) 286*0Sstevel@tonic-gate 287*0Sstevel@tonic-gateThese will tell you if you truly have an integer, double, or string pointer 288*0Sstevel@tonic-gatestored in your SV. The "p" stands for private. 289*0Sstevel@tonic-gate 290*0Sstevel@tonic-gateThe are various ways in which the private and public flags may differ. 291*0Sstevel@tonic-gateFor example, a tied SV may have a valid underlying value in the IV slot 292*0Sstevel@tonic-gate(so SvIOKp is true), but the data should be accessed via the FETCH 293*0Sstevel@tonic-gateroutine rather than directly, so SvIOK is false. Another is when 294*0Sstevel@tonic-gatenumeric conversion has occured and precision has been lost: only the 295*0Sstevel@tonic-gateprivate flag is set on 'lossy' values. So when an NV is converted to an 296*0Sstevel@tonic-gateIV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 297*0Sstevel@tonic-gate 298*0Sstevel@tonic-gateIn general, though, it's best to use the C<Sv*V> macros. 299*0Sstevel@tonic-gate 300*0Sstevel@tonic-gate=head2 Working with AVs 301*0Sstevel@tonic-gate 302*0Sstevel@tonic-gateThere are two ways to create and load an AV. The first method creates an 303*0Sstevel@tonic-gateempty AV: 304*0Sstevel@tonic-gate 305*0Sstevel@tonic-gate AV* newAV(); 306*0Sstevel@tonic-gate 307*0Sstevel@tonic-gateThe second method both creates the AV and initially populates it with SVs: 308*0Sstevel@tonic-gate 309*0Sstevel@tonic-gate AV* av_make(I32 num, SV **ptr); 310*0Sstevel@tonic-gate 311*0Sstevel@tonic-gateThe second argument points to an array containing C<num> C<SV*>'s. Once the 312*0Sstevel@tonic-gateAV has been created, the SVs can be destroyed, if so desired. 313*0Sstevel@tonic-gate 314*0Sstevel@tonic-gateOnce the AV has been created, the following operations are possible on AVs: 315*0Sstevel@tonic-gate 316*0Sstevel@tonic-gate void av_push(AV*, SV*); 317*0Sstevel@tonic-gate SV* av_pop(AV*); 318*0Sstevel@tonic-gate SV* av_shift(AV*); 319*0Sstevel@tonic-gate void av_unshift(AV*, I32 num); 320*0Sstevel@tonic-gate 321*0Sstevel@tonic-gateThese should be familiar operations, with the exception of C<av_unshift>. 322*0Sstevel@tonic-gateThis routine adds C<num> elements at the front of the array with the C<undef> 323*0Sstevel@tonic-gatevalue. You must then use C<av_store> (described below) to assign values 324*0Sstevel@tonic-gateto these new elements. 325*0Sstevel@tonic-gate 326*0Sstevel@tonic-gateHere are some other functions: 327*0Sstevel@tonic-gate 328*0Sstevel@tonic-gate I32 av_len(AV*); 329*0Sstevel@tonic-gate SV** av_fetch(AV*, I32 key, I32 lval); 330*0Sstevel@tonic-gate SV** av_store(AV*, I32 key, SV* val); 331*0Sstevel@tonic-gate 332*0Sstevel@tonic-gateThe C<av_len> function returns the highest index value in array (just 333*0Sstevel@tonic-gatelike $#array in Perl). If the array is empty, -1 is returned. The 334*0Sstevel@tonic-gateC<av_fetch> function returns the value at index C<key>, but if C<lval> 335*0Sstevel@tonic-gateis non-zero, then C<av_fetch> will store an undef value at that index. 336*0Sstevel@tonic-gateThe C<av_store> function stores the value C<val> at index C<key>, and does 337*0Sstevel@tonic-gatenot increment the reference count of C<val>. Thus the caller is responsible 338*0Sstevel@tonic-gatefor taking care of that, and if C<av_store> returns NULL, the caller will 339*0Sstevel@tonic-gatehave to decrement the reference count to avoid a memory leak. Note that 340*0Sstevel@tonic-gateC<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 341*0Sstevel@tonic-gatereturn value. 342*0Sstevel@tonic-gate 343*0Sstevel@tonic-gate void av_clear(AV*); 344*0Sstevel@tonic-gate void av_undef(AV*); 345*0Sstevel@tonic-gate void av_extend(AV*, I32 key); 346*0Sstevel@tonic-gate 347*0Sstevel@tonic-gateThe C<av_clear> function deletes all the elements in the AV* array, but 348*0Sstevel@tonic-gatedoes not actually delete the array itself. The C<av_undef> function will 349*0Sstevel@tonic-gatedelete all the elements in the array plus the array itself. The 350*0Sstevel@tonic-gateC<av_extend> function extends the array so that it contains at least C<key+1> 351*0Sstevel@tonic-gateelements. If C<key+1> is less than the currently allocated length of the array, 352*0Sstevel@tonic-gatethen nothing is done. 353*0Sstevel@tonic-gate 354*0Sstevel@tonic-gateIf you know the name of an array variable, you can get a pointer to its AV 355*0Sstevel@tonic-gateby using the following: 356*0Sstevel@tonic-gate 357*0Sstevel@tonic-gate AV* get_av("package::varname", FALSE); 358*0Sstevel@tonic-gate 359*0Sstevel@tonic-gateThis returns NULL if the variable does not exist. 360*0Sstevel@tonic-gate 361*0Sstevel@tonic-gateSee L<Understanding the Magic of Tied Hashes and Arrays> for more 362*0Sstevel@tonic-gateinformation on how to use the array access functions on tied arrays. 363*0Sstevel@tonic-gate 364*0Sstevel@tonic-gate=head2 Working with HVs 365*0Sstevel@tonic-gate 366*0Sstevel@tonic-gateTo create an HV, you use the following routine: 367*0Sstevel@tonic-gate 368*0Sstevel@tonic-gate HV* newHV(); 369*0Sstevel@tonic-gate 370*0Sstevel@tonic-gateOnce the HV has been created, the following operations are possible on HVs: 371*0Sstevel@tonic-gate 372*0Sstevel@tonic-gate SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 373*0Sstevel@tonic-gate SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 374*0Sstevel@tonic-gate 375*0Sstevel@tonic-gateThe C<klen> parameter is the length of the key being passed in (Note that 376*0Sstevel@tonic-gateyou cannot pass 0 in as a value of C<klen> to tell Perl to measure the 377*0Sstevel@tonic-gatelength of the key). The C<val> argument contains the SV pointer to the 378*0Sstevel@tonic-gatescalar being stored, and C<hash> is the precomputed hash value (zero if 379*0Sstevel@tonic-gateyou want C<hv_store> to calculate it for you). The C<lval> parameter 380*0Sstevel@tonic-gateindicates whether this fetch is actually a part of a store operation, in 381*0Sstevel@tonic-gatewhich case a new undefined value will be added to the HV with the supplied 382*0Sstevel@tonic-gatekey and C<hv_fetch> will return as if the value had already existed. 383*0Sstevel@tonic-gate 384*0Sstevel@tonic-gateRemember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 385*0Sstevel@tonic-gateC<SV*>. To access the scalar value, you must first dereference the return 386*0Sstevel@tonic-gatevalue. However, you should check to make sure that the return value is 387*0Sstevel@tonic-gatenot NULL before dereferencing it. 388*0Sstevel@tonic-gate 389*0Sstevel@tonic-gateThese two functions check if a hash table entry exists, and deletes it. 390*0Sstevel@tonic-gate 391*0Sstevel@tonic-gate bool hv_exists(HV*, const char* key, U32 klen); 392*0Sstevel@tonic-gate SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 393*0Sstevel@tonic-gate 394*0Sstevel@tonic-gateIf C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 395*0Sstevel@tonic-gatecreate and return a mortal copy of the deleted value. 396*0Sstevel@tonic-gate 397*0Sstevel@tonic-gateAnd more miscellaneous functions: 398*0Sstevel@tonic-gate 399*0Sstevel@tonic-gate void hv_clear(HV*); 400*0Sstevel@tonic-gate void hv_undef(HV*); 401*0Sstevel@tonic-gate 402*0Sstevel@tonic-gateLike their AV counterparts, C<hv_clear> deletes all the entries in the hash 403*0Sstevel@tonic-gatetable but does not actually delete the hash table. The C<hv_undef> deletes 404*0Sstevel@tonic-gateboth the entries and the hash table itself. 405*0Sstevel@tonic-gate 406*0Sstevel@tonic-gatePerl keeps the actual data in linked list of structures with a typedef of HE. 407*0Sstevel@tonic-gateThese contain the actual key and value pointers (plus extra administrative 408*0Sstevel@tonic-gateoverhead). The key is a string pointer; the value is an C<SV*>. However, 409*0Sstevel@tonic-gateonce you have an C<HE*>, to get the actual key and value, use the routines 410*0Sstevel@tonic-gatespecified below. 411*0Sstevel@tonic-gate 412*0Sstevel@tonic-gate I32 hv_iterinit(HV*); 413*0Sstevel@tonic-gate /* Prepares starting point to traverse hash table */ 414*0Sstevel@tonic-gate HE* hv_iternext(HV*); 415*0Sstevel@tonic-gate /* Get the next entry, and return a pointer to a 416*0Sstevel@tonic-gate structure that has both the key and value */ 417*0Sstevel@tonic-gate char* hv_iterkey(HE* entry, I32* retlen); 418*0Sstevel@tonic-gate /* Get the key from an HE structure and also return 419*0Sstevel@tonic-gate the length of the key string */ 420*0Sstevel@tonic-gate SV* hv_iterval(HV*, HE* entry); 421*0Sstevel@tonic-gate /* Return an SV pointer to the value of the HE 422*0Sstevel@tonic-gate structure */ 423*0Sstevel@tonic-gate SV* hv_iternextsv(HV*, char** key, I32* retlen); 424*0Sstevel@tonic-gate /* This convenience routine combines hv_iternext, 425*0Sstevel@tonic-gate hv_iterkey, and hv_iterval. The key and retlen 426*0Sstevel@tonic-gate arguments are return values for the key and its 427*0Sstevel@tonic-gate length. The value is returned in the SV* argument */ 428*0Sstevel@tonic-gate 429*0Sstevel@tonic-gateIf you know the name of a hash variable, you can get a pointer to its HV 430*0Sstevel@tonic-gateby using the following: 431*0Sstevel@tonic-gate 432*0Sstevel@tonic-gate HV* get_hv("package::varname", FALSE); 433*0Sstevel@tonic-gate 434*0Sstevel@tonic-gateThis returns NULL if the variable does not exist. 435*0Sstevel@tonic-gate 436*0Sstevel@tonic-gateThe hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro: 437*0Sstevel@tonic-gate 438*0Sstevel@tonic-gate hash = 0; 439*0Sstevel@tonic-gate while (klen--) 440*0Sstevel@tonic-gate hash = (hash * 33) + *key++; 441*0Sstevel@tonic-gate hash = hash + (hash >> 5); /* after 5.6 */ 442*0Sstevel@tonic-gate 443*0Sstevel@tonic-gateThe last step was added in version 5.6 to improve distribution of 444*0Sstevel@tonic-gatelower bits in the resulting hash value. 445*0Sstevel@tonic-gate 446*0Sstevel@tonic-gateSee L<Understanding the Magic of Tied Hashes and Arrays> for more 447*0Sstevel@tonic-gateinformation on how to use the hash access functions on tied hashes. 448*0Sstevel@tonic-gate 449*0Sstevel@tonic-gate=head2 Hash API Extensions 450*0Sstevel@tonic-gate 451*0Sstevel@tonic-gateBeginning with version 5.004, the following functions are also supported: 452*0Sstevel@tonic-gate 453*0Sstevel@tonic-gate HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 454*0Sstevel@tonic-gate HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 455*0Sstevel@tonic-gate 456*0Sstevel@tonic-gate bool hv_exists_ent (HV* tb, SV* key, U32 hash); 457*0Sstevel@tonic-gate SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 458*0Sstevel@tonic-gate 459*0Sstevel@tonic-gate SV* hv_iterkeysv (HE* entry); 460*0Sstevel@tonic-gate 461*0Sstevel@tonic-gateNote that these functions take C<SV*> keys, which simplifies writing 462*0Sstevel@tonic-gateof extension code that deals with hash structures. These functions 463*0Sstevel@tonic-gatealso allow passing of C<SV*> keys to C<tie> functions without forcing 464*0Sstevel@tonic-gateyou to stringify the keys (unlike the previous set of functions). 465*0Sstevel@tonic-gate 466*0Sstevel@tonic-gateThey also return and accept whole hash entries (C<HE*>), making their 467*0Sstevel@tonic-gateuse more efficient (since the hash number for a particular string 468*0Sstevel@tonic-gatedoesn't have to be recomputed every time). See L<perlapi> for detailed 469*0Sstevel@tonic-gatedescriptions. 470*0Sstevel@tonic-gate 471*0Sstevel@tonic-gateThe following macros must always be used to access the contents of hash 472*0Sstevel@tonic-gateentries. Note that the arguments to these macros must be simple 473*0Sstevel@tonic-gatevariables, since they may get evaluated more than once. See 474*0Sstevel@tonic-gateL<perlapi> for detailed descriptions of these macros. 475*0Sstevel@tonic-gate 476*0Sstevel@tonic-gate HePV(HE* he, STRLEN len) 477*0Sstevel@tonic-gate HeVAL(HE* he) 478*0Sstevel@tonic-gate HeHASH(HE* he) 479*0Sstevel@tonic-gate HeSVKEY(HE* he) 480*0Sstevel@tonic-gate HeSVKEY_force(HE* he) 481*0Sstevel@tonic-gate HeSVKEY_set(HE* he, SV* sv) 482*0Sstevel@tonic-gate 483*0Sstevel@tonic-gateThese two lower level macros are defined, but must only be used when 484*0Sstevel@tonic-gatedealing with keys that are not C<SV*>s: 485*0Sstevel@tonic-gate 486*0Sstevel@tonic-gate HeKEY(HE* he) 487*0Sstevel@tonic-gate HeKLEN(HE* he) 488*0Sstevel@tonic-gate 489*0Sstevel@tonic-gateNote that both C<hv_store> and C<hv_store_ent> do not increment the 490*0Sstevel@tonic-gatereference count of the stored C<val>, which is the caller's responsibility. 491*0Sstevel@tonic-gateIf these functions return a NULL value, the caller will usually have to 492*0Sstevel@tonic-gatedecrement the reference count of C<val> to avoid a memory leak. 493*0Sstevel@tonic-gate 494*0Sstevel@tonic-gate=head2 AVs, HVs and undefined values 495*0Sstevel@tonic-gate 496*0Sstevel@tonic-gateSometimes you have to store undefined values in AVs or HVs. Although 497*0Sstevel@tonic-gatethis may be a rare case, it can be tricky. That's because you're 498*0Sstevel@tonic-gateused to using C<&PL_sv_undef> if you need an undefined SV. 499*0Sstevel@tonic-gate 500*0Sstevel@tonic-gateFor example, intuition tells you that this XS code: 501*0Sstevel@tonic-gate 502*0Sstevel@tonic-gate AV *av = newAV(); 503*0Sstevel@tonic-gate av_store( av, 0, &PL_sv_undef ); 504*0Sstevel@tonic-gate 505*0Sstevel@tonic-gateis equivalent to this Perl code: 506*0Sstevel@tonic-gate 507*0Sstevel@tonic-gate my @av; 508*0Sstevel@tonic-gate $av[0] = undef; 509*0Sstevel@tonic-gate 510*0Sstevel@tonic-gateUnfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker 511*0Sstevel@tonic-gatefor indicating that an array element has not yet been initialized. 512*0Sstevel@tonic-gateThus, C<exists $av[0]> would be true for the above Perl code, but 513*0Sstevel@tonic-gatefalse for the array generated by the XS code. 514*0Sstevel@tonic-gate 515*0Sstevel@tonic-gateOther problems can occur when storing C<&PL_sv_undef> in HVs: 516*0Sstevel@tonic-gate 517*0Sstevel@tonic-gate hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 518*0Sstevel@tonic-gate 519*0Sstevel@tonic-gateThis will indeed make the value C<undef>, but if you try to modify 520*0Sstevel@tonic-gatethe value of C<key>, you'll get the following error: 521*0Sstevel@tonic-gate 522*0Sstevel@tonic-gate Modification of non-creatable hash value attempted 523*0Sstevel@tonic-gate 524*0Sstevel@tonic-gateIn perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 525*0Sstevel@tonic-gatein restricted hashes. This caused such hash entries not to appear 526*0Sstevel@tonic-gatewhen iterating over the hash or when checking for the keys 527*0Sstevel@tonic-gatewith the C<hv_exists> function. 528*0Sstevel@tonic-gate 529*0Sstevel@tonic-gateYou can run into similar problems when you store C<&PL_sv_true> or 530*0Sstevel@tonic-gateC<&PL_sv_false> into AVs or HVs. Trying to modify such elements 531*0Sstevel@tonic-gatewill give you the following error: 532*0Sstevel@tonic-gate 533*0Sstevel@tonic-gate Modification of a read-only value attempted 534*0Sstevel@tonic-gate 535*0Sstevel@tonic-gateTo make a long story short, you can use the special variables 536*0Sstevel@tonic-gateC<&PL_sv_undef>, C<&PL_sv_true> and C<&PL_sv_false> with AVs and 537*0Sstevel@tonic-gateHVs, but you have to make sure you know what you're doing. 538*0Sstevel@tonic-gate 539*0Sstevel@tonic-gateGenerally, if you want to store an undefined value in an AV 540*0Sstevel@tonic-gateor HV, you should not use C<&PL_sv_undef>, but rather create a 541*0Sstevel@tonic-gatenew undefined value using the C<newSV> function, for example: 542*0Sstevel@tonic-gate 543*0Sstevel@tonic-gate av_store( av, 42, newSV(0) ); 544*0Sstevel@tonic-gate hv_store( hv, "foo", 3, newSV(0), 0 ); 545*0Sstevel@tonic-gate 546*0Sstevel@tonic-gate=head2 References 547*0Sstevel@tonic-gate 548*0Sstevel@tonic-gateReferences are a special type of scalar that point to other data types 549*0Sstevel@tonic-gate(including references). 550*0Sstevel@tonic-gate 551*0Sstevel@tonic-gateTo create a reference, use either of the following functions: 552*0Sstevel@tonic-gate 553*0Sstevel@tonic-gate SV* newRV_inc((SV*) thing); 554*0Sstevel@tonic-gate SV* newRV_noinc((SV*) thing); 555*0Sstevel@tonic-gate 556*0Sstevel@tonic-gateThe C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 557*0Sstevel@tonic-gatefunctions are identical except that C<newRV_inc> increments the reference 558*0Sstevel@tonic-gatecount of the C<thing>, while C<newRV_noinc> does not. For historical 559*0Sstevel@tonic-gatereasons, C<newRV> is a synonym for C<newRV_inc>. 560*0Sstevel@tonic-gate 561*0Sstevel@tonic-gateOnce you have a reference, you can use the following macro to dereference 562*0Sstevel@tonic-gatethe reference: 563*0Sstevel@tonic-gate 564*0Sstevel@tonic-gate SvRV(SV*) 565*0Sstevel@tonic-gate 566*0Sstevel@tonic-gatethen call the appropriate routines, casting the returned C<SV*> to either an 567*0Sstevel@tonic-gateC<AV*> or C<HV*>, if required. 568*0Sstevel@tonic-gate 569*0Sstevel@tonic-gateTo determine if an SV is a reference, you can use the following macro: 570*0Sstevel@tonic-gate 571*0Sstevel@tonic-gate SvROK(SV*) 572*0Sstevel@tonic-gate 573*0Sstevel@tonic-gateTo discover what type of value the reference refers to, use the following 574*0Sstevel@tonic-gatemacro and then check the return value. 575*0Sstevel@tonic-gate 576*0Sstevel@tonic-gate SvTYPE(SvRV(SV*)) 577*0Sstevel@tonic-gate 578*0Sstevel@tonic-gateThe most useful types that will be returned are: 579*0Sstevel@tonic-gate 580*0Sstevel@tonic-gate SVt_IV Scalar 581*0Sstevel@tonic-gate SVt_NV Scalar 582*0Sstevel@tonic-gate SVt_PV Scalar 583*0Sstevel@tonic-gate SVt_RV Scalar 584*0Sstevel@tonic-gate SVt_PVAV Array 585*0Sstevel@tonic-gate SVt_PVHV Hash 586*0Sstevel@tonic-gate SVt_PVCV Code 587*0Sstevel@tonic-gate SVt_PVGV Glob (possible a file handle) 588*0Sstevel@tonic-gate SVt_PVMG Blessed or Magical Scalar 589*0Sstevel@tonic-gate 590*0Sstevel@tonic-gate See the sv.h header file for more details. 591*0Sstevel@tonic-gate 592*0Sstevel@tonic-gate=head2 Blessed References and Class Objects 593*0Sstevel@tonic-gate 594*0Sstevel@tonic-gateReferences are also used to support object-oriented programming. In perl's 595*0Sstevel@tonic-gateOO lexicon, an object is simply a reference that has been blessed into a 596*0Sstevel@tonic-gatepackage (or class). Once blessed, the programmer may now use the reference 597*0Sstevel@tonic-gateto access the various methods in the class. 598*0Sstevel@tonic-gate 599*0Sstevel@tonic-gateA reference can be blessed into a package with the following function: 600*0Sstevel@tonic-gate 601*0Sstevel@tonic-gate SV* sv_bless(SV* sv, HV* stash); 602*0Sstevel@tonic-gate 603*0Sstevel@tonic-gateThe C<sv> argument must be a reference value. The C<stash> argument 604*0Sstevel@tonic-gatespecifies which class the reference will belong to. See 605*0Sstevel@tonic-gateL<Stashes and Globs> for information on converting class names into stashes. 606*0Sstevel@tonic-gate 607*0Sstevel@tonic-gate/* Still under construction */ 608*0Sstevel@tonic-gate 609*0Sstevel@tonic-gateUpgrades rv to reference if not already one. Creates new SV for rv to 610*0Sstevel@tonic-gatepoint to. If C<classname> is non-null, the SV is blessed into the specified 611*0Sstevel@tonic-gateclass. SV is returned. 612*0Sstevel@tonic-gate 613*0Sstevel@tonic-gate SV* newSVrv(SV* rv, const char* classname); 614*0Sstevel@tonic-gate 615*0Sstevel@tonic-gateCopies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed 616*0Sstevel@tonic-gateif C<classname> is non-null. 617*0Sstevel@tonic-gate 618*0Sstevel@tonic-gate SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 619*0Sstevel@tonic-gate SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 620*0Sstevel@tonic-gate SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 621*0Sstevel@tonic-gate 622*0Sstevel@tonic-gateCopies the pointer value (I<the address, not the string!>) into an SV whose 623*0Sstevel@tonic-gatereference is rv. SV is blessed if C<classname> is non-null. 624*0Sstevel@tonic-gate 625*0Sstevel@tonic-gate SV* sv_setref_pv(SV* rv, const char* classname, PV iv); 626*0Sstevel@tonic-gate 627*0Sstevel@tonic-gateCopies string into an SV whose reference is C<rv>. Set length to 0 to let 628*0Sstevel@tonic-gatePerl calculate the string length. SV is blessed if C<classname> is non-null. 629*0Sstevel@tonic-gate 630*0Sstevel@tonic-gate SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length); 631*0Sstevel@tonic-gate 632*0Sstevel@tonic-gateTests whether the SV is blessed into the specified class. It does not 633*0Sstevel@tonic-gatecheck inheritance relationships. 634*0Sstevel@tonic-gate 635*0Sstevel@tonic-gate int sv_isa(SV* sv, const char* name); 636*0Sstevel@tonic-gate 637*0Sstevel@tonic-gateTests whether the SV is a reference to a blessed object. 638*0Sstevel@tonic-gate 639*0Sstevel@tonic-gate int sv_isobject(SV* sv); 640*0Sstevel@tonic-gate 641*0Sstevel@tonic-gateTests whether the SV is derived from the specified class. SV can be either 642*0Sstevel@tonic-gatea reference to a blessed object or a string containing a class name. This 643*0Sstevel@tonic-gateis the function implementing the C<UNIVERSAL::isa> functionality. 644*0Sstevel@tonic-gate 645*0Sstevel@tonic-gate bool sv_derived_from(SV* sv, const char* name); 646*0Sstevel@tonic-gate 647*0Sstevel@tonic-gateTo check if you've got an object derived from a specific class you have 648*0Sstevel@tonic-gateto write: 649*0Sstevel@tonic-gate 650*0Sstevel@tonic-gate if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 651*0Sstevel@tonic-gate 652*0Sstevel@tonic-gate=head2 Creating New Variables 653*0Sstevel@tonic-gate 654*0Sstevel@tonic-gateTo create a new Perl variable with an undef value which can be accessed from 655*0Sstevel@tonic-gateyour Perl script, use the following routines, depending on the variable type. 656*0Sstevel@tonic-gate 657*0Sstevel@tonic-gate SV* get_sv("package::varname", TRUE); 658*0Sstevel@tonic-gate AV* get_av("package::varname", TRUE); 659*0Sstevel@tonic-gate HV* get_hv("package::varname", TRUE); 660*0Sstevel@tonic-gate 661*0Sstevel@tonic-gateNotice the use of TRUE as the second parameter. The new variable can now 662*0Sstevel@tonic-gatebe set, using the routines appropriate to the data type. 663*0Sstevel@tonic-gate 664*0Sstevel@tonic-gateThere are additional macros whose values may be bitwise OR'ed with the 665*0Sstevel@tonic-gateC<TRUE> argument to enable certain extra features. Those bits are: 666*0Sstevel@tonic-gate 667*0Sstevel@tonic-gate=over 668*0Sstevel@tonic-gate 669*0Sstevel@tonic-gate=item GV_ADDMULTI 670*0Sstevel@tonic-gate 671*0Sstevel@tonic-gateMarks the variable as multiply defined, thus preventing the: 672*0Sstevel@tonic-gate 673*0Sstevel@tonic-gate Name <varname> used only once: possible typo 674*0Sstevel@tonic-gate 675*0Sstevel@tonic-gatewarning. 676*0Sstevel@tonic-gate 677*0Sstevel@tonic-gate=item GV_ADDWARN 678*0Sstevel@tonic-gate 679*0Sstevel@tonic-gateIssues the warning: 680*0Sstevel@tonic-gate 681*0Sstevel@tonic-gate Had to create <varname> unexpectedly 682*0Sstevel@tonic-gate 683*0Sstevel@tonic-gateif the variable did not exist before the function was called. 684*0Sstevel@tonic-gate 685*0Sstevel@tonic-gate=back 686*0Sstevel@tonic-gate 687*0Sstevel@tonic-gateIf you do not specify a package name, the variable is created in the current 688*0Sstevel@tonic-gatepackage. 689*0Sstevel@tonic-gate 690*0Sstevel@tonic-gate=head2 Reference Counts and Mortality 691*0Sstevel@tonic-gate 692*0Sstevel@tonic-gatePerl uses a reference count-driven garbage collection mechanism. SVs, 693*0Sstevel@tonic-gateAVs, or HVs (xV for short in the following) start their life with a 694*0Sstevel@tonic-gatereference count of 1. If the reference count of an xV ever drops to 0, 695*0Sstevel@tonic-gatethen it will be destroyed and its memory made available for reuse. 696*0Sstevel@tonic-gate 697*0Sstevel@tonic-gateThis normally doesn't happen at the Perl level unless a variable is 698*0Sstevel@tonic-gateundef'ed or the last variable holding a reference to it is changed or 699*0Sstevel@tonic-gateoverwritten. At the internal level, however, reference counts can be 700*0Sstevel@tonic-gatemanipulated with the following macros: 701*0Sstevel@tonic-gate 702*0Sstevel@tonic-gate int SvREFCNT(SV* sv); 703*0Sstevel@tonic-gate SV* SvREFCNT_inc(SV* sv); 704*0Sstevel@tonic-gate void SvREFCNT_dec(SV* sv); 705*0Sstevel@tonic-gate 706*0Sstevel@tonic-gateHowever, there is one other function which manipulates the reference 707*0Sstevel@tonic-gatecount of its argument. The C<newRV_inc> function, you will recall, 708*0Sstevel@tonic-gatecreates a reference to the specified argument. As a side effect, 709*0Sstevel@tonic-gateit increments the argument's reference count. If this is not what 710*0Sstevel@tonic-gateyou want, use C<newRV_noinc> instead. 711*0Sstevel@tonic-gate 712*0Sstevel@tonic-gateFor example, imagine you want to return a reference from an XSUB function. 713*0Sstevel@tonic-gateInside the XSUB routine, you create an SV which initially has a reference 714*0Sstevel@tonic-gatecount of one. Then you call C<newRV_inc>, passing it the just-created SV. 715*0Sstevel@tonic-gateThis returns the reference as a new SV, but the reference count of the 716*0Sstevel@tonic-gateSV you passed to C<newRV_inc> has been incremented to two. Now you 717*0Sstevel@tonic-gatereturn the reference from the XSUB routine and forget about the SV. 718*0Sstevel@tonic-gateBut Perl hasn't! Whenever the returned reference is destroyed, the 719*0Sstevel@tonic-gatereference count of the original SV is decreased to one and nothing happens. 720*0Sstevel@tonic-gateThe SV will hang around without any way to access it until Perl itself 721*0Sstevel@tonic-gateterminates. This is a memory leak. 722*0Sstevel@tonic-gate 723*0Sstevel@tonic-gateThe correct procedure, then, is to use C<newRV_noinc> instead of 724*0Sstevel@tonic-gateC<newRV_inc>. Then, if and when the last reference is destroyed, 725*0Sstevel@tonic-gatethe reference count of the SV will go to zero and it will be destroyed, 726*0Sstevel@tonic-gatestopping any memory leak. 727*0Sstevel@tonic-gate 728*0Sstevel@tonic-gateThere are some convenience functions available that can help with the 729*0Sstevel@tonic-gatedestruction of xVs. These functions introduce the concept of "mortality". 730*0Sstevel@tonic-gateAn xV that is mortal has had its reference count marked to be decremented, 731*0Sstevel@tonic-gatebut not actually decremented, until "a short time later". Generally the 732*0Sstevel@tonic-gateterm "short time later" means a single Perl statement, such as a call to 733*0Sstevel@tonic-gatean XSUB function. The actual determinant for when mortal xVs have their 734*0Sstevel@tonic-gatereference count decremented depends on two macros, SAVETMPS and FREETMPS. 735*0Sstevel@tonic-gateSee L<perlcall> and L<perlxs> for more details on these macros. 736*0Sstevel@tonic-gate 737*0Sstevel@tonic-gate"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. 738*0Sstevel@tonic-gateHowever, if you mortalize a variable twice, the reference count will 739*0Sstevel@tonic-gatelater be decremented twice. 740*0Sstevel@tonic-gate 741*0Sstevel@tonic-gate"Mortal" SVs are mainly used for SVs that are placed on perl's stack. 742*0Sstevel@tonic-gateFor example an SV which is created just to pass a number to a called sub 743*0Sstevel@tonic-gateis made mortal to have it cleaned up automatically when it's popped off 744*0Sstevel@tonic-gatethe stack. Similarly, results returned by XSUBs (which are pushed on the 745*0Sstevel@tonic-gatestack) are often made mortal. 746*0Sstevel@tonic-gate 747*0Sstevel@tonic-gateTo create a mortal variable, use the functions: 748*0Sstevel@tonic-gate 749*0Sstevel@tonic-gate SV* sv_newmortal() 750*0Sstevel@tonic-gate SV* sv_2mortal(SV*) 751*0Sstevel@tonic-gate SV* sv_mortalcopy(SV*) 752*0Sstevel@tonic-gate 753*0Sstevel@tonic-gateThe first call creates a mortal SV (with no value), the second converts an existing 754*0Sstevel@tonic-gateSV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the 755*0Sstevel@tonic-gatethird creates a mortal copy of an existing SV. 756*0Sstevel@tonic-gateBecause C<sv_newmortal> gives the new SV no value,it must normally be given one 757*0Sstevel@tonic-gatevia C<sv_setpv>, C<sv_setiv>, etc. : 758*0Sstevel@tonic-gate 759*0Sstevel@tonic-gate SV *tmp = sv_newmortal(); 760*0Sstevel@tonic-gate sv_setiv(tmp, an_integer); 761*0Sstevel@tonic-gate 762*0Sstevel@tonic-gateAs that is multiple C statements it is quite common so see this idiom instead: 763*0Sstevel@tonic-gate 764*0Sstevel@tonic-gate SV *tmp = sv_2mortal(newSViv(an_integer)); 765*0Sstevel@tonic-gate 766*0Sstevel@tonic-gate 767*0Sstevel@tonic-gateYou should be careful about creating mortal variables. Strange things 768*0Sstevel@tonic-gatecan happen if you make the same value mortal within multiple contexts, 769*0Sstevel@tonic-gateor if you make a variable mortal multiple times. Thinking of "Mortalization" 770*0Sstevel@tonic-gateas deferred C<SvREFCNT_dec> should help to minimize such problems. 771*0Sstevel@tonic-gateFor example if you are passing an SV which you I<know> has high enough REFCNT 772*0Sstevel@tonic-gateto survive its use on the stack you need not do any mortalization. 773*0Sstevel@tonic-gateIf you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or 774*0Sstevel@tonic-gatemaking a C<sv_mortalcopy> is safer. 775*0Sstevel@tonic-gate 776*0Sstevel@tonic-gateThe mortal routines are not just for SVs -- AVs and HVs can be 777*0Sstevel@tonic-gatemade mortal by passing their address (type-casted to C<SV*>) to the 778*0Sstevel@tonic-gateC<sv_2mortal> or C<sv_mortalcopy> routines. 779*0Sstevel@tonic-gate 780*0Sstevel@tonic-gate=head2 Stashes and Globs 781*0Sstevel@tonic-gate 782*0Sstevel@tonic-gateA B<stash> is a hash that contains all variables that are defined 783*0Sstevel@tonic-gatewithin a package. Each key of the stash is a symbol 784*0Sstevel@tonic-gatename (shared by all the different types of objects that have the same 785*0Sstevel@tonic-gatename), and each value in the hash table is a GV (Glob Value). This GV 786*0Sstevel@tonic-gatein turn contains references to the various objects of that name, 787*0Sstevel@tonic-gateincluding (but not limited to) the following: 788*0Sstevel@tonic-gate 789*0Sstevel@tonic-gate Scalar Value 790*0Sstevel@tonic-gate Array Value 791*0Sstevel@tonic-gate Hash Value 792*0Sstevel@tonic-gate I/O Handle 793*0Sstevel@tonic-gate Format 794*0Sstevel@tonic-gate Subroutine 795*0Sstevel@tonic-gate 796*0Sstevel@tonic-gateThere is a single stash called C<PL_defstash> that holds the items that exist 797*0Sstevel@tonic-gatein the C<main> package. To get at the items in other packages, append the 798*0Sstevel@tonic-gatestring "::" to the package name. The items in the C<Foo> package are in 799*0Sstevel@tonic-gatethe stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 800*0Sstevel@tonic-gatein the stash C<Baz::> in C<Bar::>'s stash. 801*0Sstevel@tonic-gate 802*0Sstevel@tonic-gateTo get the stash pointer for a particular package, use the function: 803*0Sstevel@tonic-gate 804*0Sstevel@tonic-gate HV* gv_stashpv(const char* name, I32 create) 805*0Sstevel@tonic-gate HV* gv_stashsv(SV*, I32 create) 806*0Sstevel@tonic-gate 807*0Sstevel@tonic-gateThe first function takes a literal string, the second uses the string stored 808*0Sstevel@tonic-gatein the SV. Remember that a stash is just a hash table, so you get back an 809*0Sstevel@tonic-gateC<HV*>. The C<create> flag will create a new package if it is set. 810*0Sstevel@tonic-gate 811*0Sstevel@tonic-gateThe name that C<gv_stash*v> wants is the name of the package whose symbol table 812*0Sstevel@tonic-gateyou want. The default package is called C<main>. If you have multiply nested 813*0Sstevel@tonic-gatepackages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 814*0Sstevel@tonic-gatelanguage itself. 815*0Sstevel@tonic-gate 816*0Sstevel@tonic-gateAlternately, if you have an SV that is a blessed reference, you can find 817*0Sstevel@tonic-gateout the stash pointer by using: 818*0Sstevel@tonic-gate 819*0Sstevel@tonic-gate HV* SvSTASH(SvRV(SV*)); 820*0Sstevel@tonic-gate 821*0Sstevel@tonic-gatethen use the following to get the package name itself: 822*0Sstevel@tonic-gate 823*0Sstevel@tonic-gate char* HvNAME(HV* stash); 824*0Sstevel@tonic-gate 825*0Sstevel@tonic-gateIf you need to bless or re-bless an object you can use the following 826*0Sstevel@tonic-gatefunction: 827*0Sstevel@tonic-gate 828*0Sstevel@tonic-gate SV* sv_bless(SV*, HV* stash) 829*0Sstevel@tonic-gate 830*0Sstevel@tonic-gatewhere the first argument, an C<SV*>, must be a reference, and the second 831*0Sstevel@tonic-gateargument is a stash. The returned C<SV*> can now be used in the same way 832*0Sstevel@tonic-gateas any other SV. 833*0Sstevel@tonic-gate 834*0Sstevel@tonic-gateFor more information on references and blessings, consult L<perlref>. 835*0Sstevel@tonic-gate 836*0Sstevel@tonic-gate=head2 Double-Typed SVs 837*0Sstevel@tonic-gate 838*0Sstevel@tonic-gateScalar variables normally contain only one type of value, an integer, 839*0Sstevel@tonic-gatedouble, pointer, or reference. Perl will automatically convert the 840*0Sstevel@tonic-gateactual scalar data from the stored type into the requested type. 841*0Sstevel@tonic-gate 842*0Sstevel@tonic-gateSome scalar variables contain more than one type of scalar data. For 843*0Sstevel@tonic-gateexample, the variable C<$!> contains either the numeric value of C<errno> 844*0Sstevel@tonic-gateor its string equivalent from either C<strerror> or C<sys_errlist[]>. 845*0Sstevel@tonic-gate 846*0Sstevel@tonic-gateTo force multiple data values into an SV, you must do two things: use the 847*0Sstevel@tonic-gateC<sv_set*v> routines to add the additional scalar type, then set a flag 848*0Sstevel@tonic-gateso that Perl will believe it contains more than one type of data. The 849*0Sstevel@tonic-gatefour macros to set the flags are: 850*0Sstevel@tonic-gate 851*0Sstevel@tonic-gate SvIOK_on 852*0Sstevel@tonic-gate SvNOK_on 853*0Sstevel@tonic-gate SvPOK_on 854*0Sstevel@tonic-gate SvROK_on 855*0Sstevel@tonic-gate 856*0Sstevel@tonic-gateThe particular macro you must use depends on which C<sv_set*v> routine 857*0Sstevel@tonic-gateyou called first. This is because every C<sv_set*v> routine turns on 858*0Sstevel@tonic-gateonly the bit for the particular type of data being set, and turns off 859*0Sstevel@tonic-gateall the rest. 860*0Sstevel@tonic-gate 861*0Sstevel@tonic-gateFor example, to create a new Perl variable called "dberror" that contains 862*0Sstevel@tonic-gateboth the numeric and descriptive string error values, you could use the 863*0Sstevel@tonic-gatefollowing code: 864*0Sstevel@tonic-gate 865*0Sstevel@tonic-gate extern int dberror; 866*0Sstevel@tonic-gate extern char *dberror_list; 867*0Sstevel@tonic-gate 868*0Sstevel@tonic-gate SV* sv = get_sv("dberror", TRUE); 869*0Sstevel@tonic-gate sv_setiv(sv, (IV) dberror); 870*0Sstevel@tonic-gate sv_setpv(sv, dberror_list[dberror]); 871*0Sstevel@tonic-gate SvIOK_on(sv); 872*0Sstevel@tonic-gate 873*0Sstevel@tonic-gateIf the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 874*0Sstevel@tonic-gatemacro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 875*0Sstevel@tonic-gate 876*0Sstevel@tonic-gate=head2 Magic Variables 877*0Sstevel@tonic-gate 878*0Sstevel@tonic-gate[This section still under construction. Ignore everything here. Post no 879*0Sstevel@tonic-gatebills. Everything not permitted is forbidden.] 880*0Sstevel@tonic-gate 881*0Sstevel@tonic-gateAny SV may be magical, that is, it has special features that a normal 882*0Sstevel@tonic-gateSV does not have. These features are stored in the SV structure in a 883*0Sstevel@tonic-gatelinked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 884*0Sstevel@tonic-gate 885*0Sstevel@tonic-gate struct magic { 886*0Sstevel@tonic-gate MAGIC* mg_moremagic; 887*0Sstevel@tonic-gate MGVTBL* mg_virtual; 888*0Sstevel@tonic-gate U16 mg_private; 889*0Sstevel@tonic-gate char mg_type; 890*0Sstevel@tonic-gate U8 mg_flags; 891*0Sstevel@tonic-gate SV* mg_obj; 892*0Sstevel@tonic-gate char* mg_ptr; 893*0Sstevel@tonic-gate I32 mg_len; 894*0Sstevel@tonic-gate }; 895*0Sstevel@tonic-gate 896*0Sstevel@tonic-gateNote this is current as of patchlevel 0, and could change at any time. 897*0Sstevel@tonic-gate 898*0Sstevel@tonic-gate=head2 Assigning Magic 899*0Sstevel@tonic-gate 900*0Sstevel@tonic-gatePerl adds magic to an SV using the sv_magic function: 901*0Sstevel@tonic-gate 902*0Sstevel@tonic-gate void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 903*0Sstevel@tonic-gate 904*0Sstevel@tonic-gateThe C<sv> argument is a pointer to the SV that is to acquire a new magical 905*0Sstevel@tonic-gatefeature. 906*0Sstevel@tonic-gate 907*0Sstevel@tonic-gateIf C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 908*0Sstevel@tonic-gateconvert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic 909*0Sstevel@tonic-gateto the beginning of the linked list of magical features. Any prior entry 910*0Sstevel@tonic-gateof the same type of magic is deleted. Note that this can be overridden, 911*0Sstevel@tonic-gateand multiple instances of the same type of magic can be associated with an 912*0Sstevel@tonic-gateSV. 913*0Sstevel@tonic-gate 914*0Sstevel@tonic-gateThe C<name> and C<namlen> arguments are used to associate a string with 915*0Sstevel@tonic-gatethe magic, typically the name of a variable. C<namlen> is stored in the 916*0Sstevel@tonic-gateC<mg_len> field and if C<name> is non-null and C<namlen> E<gt>= 0 a malloc'd 917*0Sstevel@tonic-gatecopy of the name is stored in C<mg_ptr> field. 918*0Sstevel@tonic-gate 919*0Sstevel@tonic-gateThe sv_magic function uses C<how> to determine which, if any, predefined 920*0Sstevel@tonic-gate"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 921*0Sstevel@tonic-gateSee the L<Magic Virtual Tables> section below. The C<how> argument is also 922*0Sstevel@tonic-gatestored in the C<mg_type> field. The value of C<how> should be chosen 923*0Sstevel@tonic-gatefrom the set of macros C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 924*0Sstevel@tonic-gatethese macros were added, Perl internals used to directly use character 925*0Sstevel@tonic-gateliterals, so you may occasionally come across old code or documentation 926*0Sstevel@tonic-gatereferring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 927*0Sstevel@tonic-gate 928*0Sstevel@tonic-gateThe C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 929*0Sstevel@tonic-gatestructure. If it is not the same as the C<sv> argument, the reference 930*0Sstevel@tonic-gatecount of the C<obj> object is incremented. If it is the same, or if 931*0Sstevel@tonic-gatethe C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer, 932*0Sstevel@tonic-gatethen C<obj> is merely stored, without the reference count being incremented. 933*0Sstevel@tonic-gate 934*0Sstevel@tonic-gateThere is also a function to add magic to an C<HV>: 935*0Sstevel@tonic-gate 936*0Sstevel@tonic-gate void hv_magic(HV *hv, GV *gv, int how); 937*0Sstevel@tonic-gate 938*0Sstevel@tonic-gateThis simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 939*0Sstevel@tonic-gate 940*0Sstevel@tonic-gateTo remove the magic from an SV, call the function sv_unmagic: 941*0Sstevel@tonic-gate 942*0Sstevel@tonic-gate void sv_unmagic(SV *sv, int type); 943*0Sstevel@tonic-gate 944*0Sstevel@tonic-gateThe C<type> argument should be equal to the C<how> value when the C<SV> 945*0Sstevel@tonic-gatewas initially made magical. 946*0Sstevel@tonic-gate 947*0Sstevel@tonic-gate=head2 Magic Virtual Tables 948*0Sstevel@tonic-gate 949*0Sstevel@tonic-gateThe C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 950*0Sstevel@tonic-gateC<MGVTBL>, which is a structure of function pointers and stands for 951*0Sstevel@tonic-gate"Magic Virtual Table" to handle the various operations that might be 952*0Sstevel@tonic-gateapplied to that variable. 953*0Sstevel@tonic-gate 954*0Sstevel@tonic-gateThe C<MGVTBL> has five pointers to the following routine types: 955*0Sstevel@tonic-gate 956*0Sstevel@tonic-gate int (*svt_get)(SV* sv, MAGIC* mg); 957*0Sstevel@tonic-gate int (*svt_set)(SV* sv, MAGIC* mg); 958*0Sstevel@tonic-gate U32 (*svt_len)(SV* sv, MAGIC* mg); 959*0Sstevel@tonic-gate int (*svt_clear)(SV* sv, MAGIC* mg); 960*0Sstevel@tonic-gate int (*svt_free)(SV* sv, MAGIC* mg); 961*0Sstevel@tonic-gate 962*0Sstevel@tonic-gateThis MGVTBL structure is set at compile-time in F<perl.h> and there are 963*0Sstevel@tonic-gatecurrently 19 types (or 21 with overloading turned on). These different 964*0Sstevel@tonic-gatestructures contain pointers to various routines that perform additional 965*0Sstevel@tonic-gateactions depending on which function is being called. 966*0Sstevel@tonic-gate 967*0Sstevel@tonic-gate Function pointer Action taken 968*0Sstevel@tonic-gate ---------------- ------------ 969*0Sstevel@tonic-gate svt_get Do something before the value of the SV is retrieved. 970*0Sstevel@tonic-gate svt_set Do something after the SV is assigned a value. 971*0Sstevel@tonic-gate svt_len Report on the SV's length. 972*0Sstevel@tonic-gate svt_clear Clear something the SV represents. 973*0Sstevel@tonic-gate svt_free Free any extra storage associated with the SV. 974*0Sstevel@tonic-gate 975*0Sstevel@tonic-gateFor instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 976*0Sstevel@tonic-gateto an C<mg_type> of C<PERL_MAGIC_sv>) contains: 977*0Sstevel@tonic-gate 978*0Sstevel@tonic-gate { magic_get, magic_set, magic_len, 0, 0 } 979*0Sstevel@tonic-gate 980*0Sstevel@tonic-gateThus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 981*0Sstevel@tonic-gateif a get operation is being performed, the routine C<magic_get> is 982*0Sstevel@tonic-gatecalled. All the various routines for the various magical types begin 983*0Sstevel@tonic-gatewith C<magic_>. NOTE: the magic routines are not considered part of 984*0Sstevel@tonic-gatethe Perl API, and may not be exported by the Perl library. 985*0Sstevel@tonic-gate 986*0Sstevel@tonic-gateThe current kinds of Magic Virtual Tables are: 987*0Sstevel@tonic-gate 988*0Sstevel@tonic-gate mg_type 989*0Sstevel@tonic-gate (old-style char and macro) MGVTBL Type of magic 990*0Sstevel@tonic-gate -------------------------- ------ ---------------------------- 991*0Sstevel@tonic-gate \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 992*0Sstevel@tonic-gate A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash 993*0Sstevel@tonic-gate a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element 994*0Sstevel@tonic-gate c PERL_MAGIC_overload_table (none) Holds overload table (AMT) 995*0Sstevel@tonic-gate on stash 996*0Sstevel@tonic-gate B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search) 997*0Sstevel@tonic-gate D PERL_MAGIC_regdata vtbl_regdata Regex match position data 998*0Sstevel@tonic-gate (@+ and @- vars) 999*0Sstevel@tonic-gate d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1000*0Sstevel@tonic-gate element 1001*0Sstevel@tonic-gate E PERL_MAGIC_env vtbl_env %ENV hash 1002*0Sstevel@tonic-gate e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1003*0Sstevel@tonic-gate f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format) 1004*0Sstevel@tonic-gate g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string 1005*0Sstevel@tonic-gate I PERL_MAGIC_isa vtbl_isa @ISA array 1006*0Sstevel@tonic-gate i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1007*0Sstevel@tonic-gate k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1008*0Sstevel@tonic-gate L PERL_MAGIC_dbfile (none) Debugger %_<filename 1009*0Sstevel@tonic-gate l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element 1010*0Sstevel@tonic-gate m PERL_MAGIC_mutex vtbl_mutex ??? 1011*0Sstevel@tonic-gate o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation 1012*0Sstevel@tonic-gate P PERL_MAGIC_tied vtbl_pack Tied array or hash 1013*0Sstevel@tonic-gate p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1014*0Sstevel@tonic-gate q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1015*0Sstevel@tonic-gate r PERL_MAGIC_qr vtbl_qr precompiled qr// regex 1016*0Sstevel@tonic-gate S PERL_MAGIC_sig vtbl_sig %SIG hash 1017*0Sstevel@tonic-gate s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1018*0Sstevel@tonic-gate t PERL_MAGIC_taint vtbl_taint Taintedness 1019*0Sstevel@tonic-gate U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions 1020*0Sstevel@tonic-gate v PERL_MAGIC_vec vtbl_vec vec() lvalue 1021*0Sstevel@tonic-gate V PERL_MAGIC_vstring (none) v-string scalars 1022*0Sstevel@tonic-gate w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache 1023*0Sstevel@tonic-gate x PERL_MAGIC_substr vtbl_substr substr() lvalue 1024*0Sstevel@tonic-gate y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1025*0Sstevel@tonic-gate variable / smart parameter 1026*0Sstevel@tonic-gate vivification 1027*0Sstevel@tonic-gate * PERL_MAGIC_glob vtbl_glob GV (typeglob) 1028*0Sstevel@tonic-gate # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1029*0Sstevel@tonic-gate . PERL_MAGIC_pos vtbl_pos pos() lvalue 1030*0Sstevel@tonic-gate < PERL_MAGIC_backref vtbl_backref ??? 1031*0Sstevel@tonic-gate ~ PERL_MAGIC_ext (none) Available for use by extensions 1032*0Sstevel@tonic-gate 1033*0Sstevel@tonic-gateWhen an uppercase and lowercase letter both exist in the table, then the 1034*0Sstevel@tonic-gateuppercase letter is typically used to represent some kind of composite type 1035*0Sstevel@tonic-gate(a list or a hash), and the lowercase letter is used to represent an element 1036*0Sstevel@tonic-gateof that composite type. Some internals code makes use of this case 1037*0Sstevel@tonic-gaterelationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1038*0Sstevel@tonic-gate 1039*0Sstevel@tonic-gateThe C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined 1040*0Sstevel@tonic-gatespecifically for use by extensions and will not be used by perl itself. 1041*0Sstevel@tonic-gateExtensions can use C<PERL_MAGIC_ext> magic to 'attach' private information 1042*0Sstevel@tonic-gateto variables (typically objects). This is especially useful because 1043*0Sstevel@tonic-gatethere is no way for normal perl code to corrupt this private information 1044*0Sstevel@tonic-gate(unlike using extra elements of a hash object). 1045*0Sstevel@tonic-gate 1046*0Sstevel@tonic-gateSimilarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1047*0Sstevel@tonic-gateC function any time a scalar's value is used or changed. The C<MAGIC>'s 1048*0Sstevel@tonic-gateC<mg_ptr> field points to a C<ufuncs> structure: 1049*0Sstevel@tonic-gate 1050*0Sstevel@tonic-gate struct ufuncs { 1051*0Sstevel@tonic-gate I32 (*uf_val)(pTHX_ IV, SV*); 1052*0Sstevel@tonic-gate I32 (*uf_set)(pTHX_ IV, SV*); 1053*0Sstevel@tonic-gate IV uf_index; 1054*0Sstevel@tonic-gate }; 1055*0Sstevel@tonic-gate 1056*0Sstevel@tonic-gateWhen the SV is read from or written to, the C<uf_val> or C<uf_set> 1057*0Sstevel@tonic-gatefunction will be called with C<uf_index> as the first arg and a pointer to 1058*0Sstevel@tonic-gatethe SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1059*0Sstevel@tonic-gatemagic is shown below. Note that the ufuncs structure is copied by 1060*0Sstevel@tonic-gatesv_magic, so you can safely allocate it on the stack. 1061*0Sstevel@tonic-gate 1062*0Sstevel@tonic-gate void 1063*0Sstevel@tonic-gate Umagic(sv) 1064*0Sstevel@tonic-gate SV *sv; 1065*0Sstevel@tonic-gate PREINIT: 1066*0Sstevel@tonic-gate struct ufuncs uf; 1067*0Sstevel@tonic-gate CODE: 1068*0Sstevel@tonic-gate uf.uf_val = &my_get_fn; 1069*0Sstevel@tonic-gate uf.uf_set = &my_set_fn; 1070*0Sstevel@tonic-gate uf.uf_index = 0; 1071*0Sstevel@tonic-gate sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1072*0Sstevel@tonic-gate 1073*0Sstevel@tonic-gateNote that because multiple extensions may be using C<PERL_MAGIC_ext> 1074*0Sstevel@tonic-gateor C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1075*0Sstevel@tonic-gateextra care to avoid conflict. Typically only using the magic on 1076*0Sstevel@tonic-gateobjects blessed into the same class as the extension is sufficient. 1077*0Sstevel@tonic-gateFor C<PERL_MAGIC_ext> magic, it may also be appropriate to add an I32 1078*0Sstevel@tonic-gate'signature' at the top of the private data area and check that. 1079*0Sstevel@tonic-gate 1080*0Sstevel@tonic-gateAlso note that the C<sv_set*()> and C<sv_cat*()> functions described 1081*0Sstevel@tonic-gateearlier do B<not> invoke 'set' magic on their targets. This must 1082*0Sstevel@tonic-gatebe done by the user either by calling the C<SvSETMAGIC()> macro after 1083*0Sstevel@tonic-gatecalling these functions, or by using one of the C<sv_set*_mg()> or 1084*0Sstevel@tonic-gateC<sv_cat*_mg()> functions. Similarly, generic C code must call the 1085*0Sstevel@tonic-gateC<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1086*0Sstevel@tonic-gateobtained from external sources in functions that don't handle magic. 1087*0Sstevel@tonic-gateSee L<perlapi> for a description of these functions. 1088*0Sstevel@tonic-gateFor example, calls to the C<sv_cat*()> functions typically need to be 1089*0Sstevel@tonic-gatefollowed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1090*0Sstevel@tonic-gatesince their implementation handles 'get' magic. 1091*0Sstevel@tonic-gate 1092*0Sstevel@tonic-gate=head2 Finding Magic 1093*0Sstevel@tonic-gate 1094*0Sstevel@tonic-gate MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */ 1095*0Sstevel@tonic-gate 1096*0Sstevel@tonic-gateThis routine returns a pointer to the C<MAGIC> structure stored in the SV. 1097*0Sstevel@tonic-gateIf the SV does not have that magical feature, C<NULL> is returned. Also, 1098*0Sstevel@tonic-gateif the SV is not of type SVt_PVMG, Perl may core dump. 1099*0Sstevel@tonic-gate 1100*0Sstevel@tonic-gate int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1101*0Sstevel@tonic-gate 1102*0Sstevel@tonic-gateThis routine checks to see what types of magic C<sv> has. If the mg_type 1103*0Sstevel@tonic-gatefield is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1104*0Sstevel@tonic-gatethe mg_type field is changed to be the lowercase letter. 1105*0Sstevel@tonic-gate 1106*0Sstevel@tonic-gate=head2 Understanding the Magic of Tied Hashes and Arrays 1107*0Sstevel@tonic-gate 1108*0Sstevel@tonic-gateTied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1109*0Sstevel@tonic-gatemagic type. 1110*0Sstevel@tonic-gate 1111*0Sstevel@tonic-gateWARNING: As of the 5.004 release, proper usage of the array and hash 1112*0Sstevel@tonic-gateaccess functions requires understanding a few caveats. Some 1113*0Sstevel@tonic-gateof these caveats are actually considered bugs in the API, to be fixed 1114*0Sstevel@tonic-gatein later releases, and are bracketed with [MAYCHANGE] below. If 1115*0Sstevel@tonic-gateyou find yourself actually applying such information in this section, be 1116*0Sstevel@tonic-gateaware that the behavior may change in the future, umm, without warning. 1117*0Sstevel@tonic-gate 1118*0Sstevel@tonic-gateThe perl tie function associates a variable with an object that implements 1119*0Sstevel@tonic-gatethe various GET, SET, etc methods. To perform the equivalent of the perl 1120*0Sstevel@tonic-gatetie function from an XSUB, you must mimic this behaviour. The code below 1121*0Sstevel@tonic-gatecarries out the necessary steps - firstly it creates a new hash, and then 1122*0Sstevel@tonic-gatecreates a second hash which it blesses into the class which will implement 1123*0Sstevel@tonic-gatethe tie methods. Lastly it ties the two hashes together, and returns a 1124*0Sstevel@tonic-gatereference to the new tied hash. Note that the code below does NOT call the 1125*0Sstevel@tonic-gateTIEHASH method in the MyTie class - 1126*0Sstevel@tonic-gatesee L<Calling Perl Routines from within C Programs> for details on how 1127*0Sstevel@tonic-gateto do this. 1128*0Sstevel@tonic-gate 1129*0Sstevel@tonic-gate SV* 1130*0Sstevel@tonic-gate mytie() 1131*0Sstevel@tonic-gate PREINIT: 1132*0Sstevel@tonic-gate HV *hash; 1133*0Sstevel@tonic-gate HV *stash; 1134*0Sstevel@tonic-gate SV *tie; 1135*0Sstevel@tonic-gate CODE: 1136*0Sstevel@tonic-gate hash = newHV(); 1137*0Sstevel@tonic-gate tie = newRV_noinc((SV*)newHV()); 1138*0Sstevel@tonic-gate stash = gv_stashpv("MyTie", TRUE); 1139*0Sstevel@tonic-gate sv_bless(tie, stash); 1140*0Sstevel@tonic-gate hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1141*0Sstevel@tonic-gate RETVAL = newRV_noinc(hash); 1142*0Sstevel@tonic-gate OUTPUT: 1143*0Sstevel@tonic-gate RETVAL 1144*0Sstevel@tonic-gate 1145*0Sstevel@tonic-gateThe C<av_store> function, when given a tied array argument, merely 1146*0Sstevel@tonic-gatecopies the magic of the array onto the value to be "stored", using 1147*0Sstevel@tonic-gateC<mg_copy>. It may also return NULL, indicating that the value did not 1148*0Sstevel@tonic-gateactually need to be stored in the array. [MAYCHANGE] After a call to 1149*0Sstevel@tonic-gateC<av_store> on a tied array, the caller will usually need to call 1150*0Sstevel@tonic-gateC<mg_set(val)> to actually invoke the perl level "STORE" method on the 1151*0Sstevel@tonic-gateTIEARRAY object. If C<av_store> did return NULL, a call to 1152*0Sstevel@tonic-gateC<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1153*0Sstevel@tonic-gateleak. [/MAYCHANGE] 1154*0Sstevel@tonic-gate 1155*0Sstevel@tonic-gateThe previous paragraph is applicable verbatim to tied hash access using the 1156*0Sstevel@tonic-gateC<hv_store> and C<hv_store_ent> functions as well. 1157*0Sstevel@tonic-gate 1158*0Sstevel@tonic-gateC<av_fetch> and the corresponding hash functions C<hv_fetch> and 1159*0Sstevel@tonic-gateC<hv_fetch_ent> actually return an undefined mortal value whose magic 1160*0Sstevel@tonic-gatehas been initialized using C<mg_copy>. Note the value so returned does not 1161*0Sstevel@tonic-gateneed to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1162*0Sstevel@tonic-gateneed to call C<mg_get()> on the returned value in order to actually invoke 1163*0Sstevel@tonic-gatethe perl level "FETCH" method on the underlying TIE object. Similarly, 1164*0Sstevel@tonic-gateyou may also call C<mg_set()> on the return value after possibly assigning 1165*0Sstevel@tonic-gatea suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1166*0Sstevel@tonic-gatemethod on the TIE object. [/MAYCHANGE] 1167*0Sstevel@tonic-gate 1168*0Sstevel@tonic-gate[MAYCHANGE] 1169*0Sstevel@tonic-gateIn other words, the array or hash fetch/store functions don't really 1170*0Sstevel@tonic-gatefetch and store actual values in the case of tied arrays and hashes. They 1171*0Sstevel@tonic-gatemerely call C<mg_copy> to attach magic to the values that were meant to be 1172*0Sstevel@tonic-gate"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1173*0Sstevel@tonic-gatedo the job of invoking the TIE methods on the underlying objects. Thus 1174*0Sstevel@tonic-gatethe magic mechanism currently implements a kind of lazy access to arrays 1175*0Sstevel@tonic-gateand hashes. 1176*0Sstevel@tonic-gate 1177*0Sstevel@tonic-gateCurrently (as of perl version 5.004), use of the hash and array access 1178*0Sstevel@tonic-gatefunctions requires the user to be aware of whether they are operating on 1179*0Sstevel@tonic-gate"normal" hashes and arrays, or on their tied variants. The API may be 1180*0Sstevel@tonic-gatechanged to provide more transparent access to both tied and normal data 1181*0Sstevel@tonic-gatetypes in future versions. 1182*0Sstevel@tonic-gate[/MAYCHANGE] 1183*0Sstevel@tonic-gate 1184*0Sstevel@tonic-gateYou would do well to understand that the TIEARRAY and TIEHASH interfaces 1185*0Sstevel@tonic-gateare mere sugar to invoke some perl method calls while using the uniform hash 1186*0Sstevel@tonic-gateand array syntax. The use of this sugar imposes some overhead (typically 1187*0Sstevel@tonic-gateabout two to four extra opcodes per FETCH/STORE operation, in addition to 1188*0Sstevel@tonic-gatethe creation of all the mortal variables required to invoke the methods). 1189*0Sstevel@tonic-gateThis overhead will be comparatively small if the TIE methods are themselves 1190*0Sstevel@tonic-gatesubstantial, but if they are only a few statements long, the overhead 1191*0Sstevel@tonic-gatewill not be insignificant. 1192*0Sstevel@tonic-gate 1193*0Sstevel@tonic-gate=head2 Localizing changes 1194*0Sstevel@tonic-gate 1195*0Sstevel@tonic-gatePerl has a very handy construction 1196*0Sstevel@tonic-gate 1197*0Sstevel@tonic-gate { 1198*0Sstevel@tonic-gate local $var = 2; 1199*0Sstevel@tonic-gate ... 1200*0Sstevel@tonic-gate } 1201*0Sstevel@tonic-gate 1202*0Sstevel@tonic-gateThis construction is I<approximately> equivalent to 1203*0Sstevel@tonic-gate 1204*0Sstevel@tonic-gate { 1205*0Sstevel@tonic-gate my $oldvar = $var; 1206*0Sstevel@tonic-gate $var = 2; 1207*0Sstevel@tonic-gate ... 1208*0Sstevel@tonic-gate $var = $oldvar; 1209*0Sstevel@tonic-gate } 1210*0Sstevel@tonic-gate 1211*0Sstevel@tonic-gateThe biggest difference is that the first construction would 1212*0Sstevel@tonic-gatereinstate the initial value of $var, irrespective of how control exits 1213*0Sstevel@tonic-gatethe block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1214*0Sstevel@tonic-gatemore efficient as well. 1215*0Sstevel@tonic-gate 1216*0Sstevel@tonic-gateThere is a way to achieve a similar task from C via Perl API: create a 1217*0Sstevel@tonic-gateI<pseudo-block>, and arrange for some changes to be automatically 1218*0Sstevel@tonic-gateundone at the end of it, either explicit, or via a non-local exit (via 1219*0Sstevel@tonic-gatedie()). A I<block>-like construct is created by a pair of 1220*0Sstevel@tonic-gateC<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1221*0Sstevel@tonic-gateSuch a construct may be created specially for some important localized 1222*0Sstevel@tonic-gatetask, or an existing one (like boundaries of enclosing Perl 1223*0Sstevel@tonic-gatesubroutine/block, or an existing pair for freeing TMPs) may be 1224*0Sstevel@tonic-gateused. (In the second case the overhead of additional localization must 1225*0Sstevel@tonic-gatebe almost negligible.) Note that any XSUB is automatically enclosed in 1226*0Sstevel@tonic-gatean C<ENTER>/C<LEAVE> pair. 1227*0Sstevel@tonic-gate 1228*0Sstevel@tonic-gateInside such a I<pseudo-block> the following service is available: 1229*0Sstevel@tonic-gate 1230*0Sstevel@tonic-gate=over 4 1231*0Sstevel@tonic-gate 1232*0Sstevel@tonic-gate=item C<SAVEINT(int i)> 1233*0Sstevel@tonic-gate 1234*0Sstevel@tonic-gate=item C<SAVEIV(IV i)> 1235*0Sstevel@tonic-gate 1236*0Sstevel@tonic-gate=item C<SAVEI32(I32 i)> 1237*0Sstevel@tonic-gate 1238*0Sstevel@tonic-gate=item C<SAVELONG(long i)> 1239*0Sstevel@tonic-gate 1240*0Sstevel@tonic-gateThese macros arrange things to restore the value of integer variable 1241*0Sstevel@tonic-gateC<i> at the end of enclosing I<pseudo-block>. 1242*0Sstevel@tonic-gate 1243*0Sstevel@tonic-gate=item C<SAVESPTR(s)> 1244*0Sstevel@tonic-gate 1245*0Sstevel@tonic-gate=item C<SAVEPPTR(p)> 1246*0Sstevel@tonic-gate 1247*0Sstevel@tonic-gateThese macros arrange things to restore the value of pointers C<s> and 1248*0Sstevel@tonic-gateC<p>. C<s> must be a pointer of a type which survives conversion to 1249*0Sstevel@tonic-gateC<SV*> and back, C<p> should be able to survive conversion to C<char*> 1250*0Sstevel@tonic-gateand back. 1251*0Sstevel@tonic-gate 1252*0Sstevel@tonic-gate=item C<SAVEFREESV(SV *sv)> 1253*0Sstevel@tonic-gate 1254*0Sstevel@tonic-gateThe refcount of C<sv> would be decremented at the end of 1255*0Sstevel@tonic-gateI<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1256*0Sstevel@tonic-gatemechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1257*0Sstevel@tonic-gateextends the lifetime of C<sv> until the beginning of the next statement, 1258*0Sstevel@tonic-gateC<SAVEFREESV> extends it until the end of the enclosing scope. These 1259*0Sstevel@tonic-gatelifetimes can be wildly different. 1260*0Sstevel@tonic-gate 1261*0Sstevel@tonic-gateAlso compare C<SAVEMORTALIZESV>. 1262*0Sstevel@tonic-gate 1263*0Sstevel@tonic-gate=item C<SAVEMORTALIZESV(SV *sv)> 1264*0Sstevel@tonic-gate 1265*0Sstevel@tonic-gateJust like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1266*0Sstevel@tonic-gatescope instead of decrementing its reference count. This usually has the 1267*0Sstevel@tonic-gateeffect of keeping C<sv> alive until the statement that called the currently 1268*0Sstevel@tonic-gatelive scope has finished executing. 1269*0Sstevel@tonic-gate 1270*0Sstevel@tonic-gate=item C<SAVEFREEOP(OP *op)> 1271*0Sstevel@tonic-gate 1272*0Sstevel@tonic-gateThe C<OP *> is op_free()ed at the end of I<pseudo-block>. 1273*0Sstevel@tonic-gate 1274*0Sstevel@tonic-gate=item C<SAVEFREEPV(p)> 1275*0Sstevel@tonic-gate 1276*0Sstevel@tonic-gateThe chunk of memory which is pointed to by C<p> is Safefree()ed at the 1277*0Sstevel@tonic-gateend of I<pseudo-block>. 1278*0Sstevel@tonic-gate 1279*0Sstevel@tonic-gate=item C<SAVECLEARSV(SV *sv)> 1280*0Sstevel@tonic-gate 1281*0Sstevel@tonic-gateClears a slot in the current scratchpad which corresponds to C<sv> at 1282*0Sstevel@tonic-gatethe end of I<pseudo-block>. 1283*0Sstevel@tonic-gate 1284*0Sstevel@tonic-gate=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1285*0Sstevel@tonic-gate 1286*0Sstevel@tonic-gateThe key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1287*0Sstevel@tonic-gatestring pointed to by C<key> is Safefree()ed. If one has a I<key> in 1288*0Sstevel@tonic-gateshort-lived storage, the corresponding string may be reallocated like 1289*0Sstevel@tonic-gatethis: 1290*0Sstevel@tonic-gate 1291*0Sstevel@tonic-gate SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1292*0Sstevel@tonic-gate 1293*0Sstevel@tonic-gate=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1294*0Sstevel@tonic-gate 1295*0Sstevel@tonic-gateAt the end of I<pseudo-block> the function C<f> is called with the 1296*0Sstevel@tonic-gateonly argument C<p>. 1297*0Sstevel@tonic-gate 1298*0Sstevel@tonic-gate=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1299*0Sstevel@tonic-gate 1300*0Sstevel@tonic-gateAt the end of I<pseudo-block> the function C<f> is called with the 1301*0Sstevel@tonic-gateimplicit context argument (if any), and C<p>. 1302*0Sstevel@tonic-gate 1303*0Sstevel@tonic-gate=item C<SAVESTACK_POS()> 1304*0Sstevel@tonic-gate 1305*0Sstevel@tonic-gateThe current offset on the Perl internal stack (cf. C<SP>) is restored 1306*0Sstevel@tonic-gateat the end of I<pseudo-block>. 1307*0Sstevel@tonic-gate 1308*0Sstevel@tonic-gate=back 1309*0Sstevel@tonic-gate 1310*0Sstevel@tonic-gateThe following API list contains functions, thus one needs to 1311*0Sstevel@tonic-gateprovide pointers to the modifiable data explicitly (either C pointers, 1312*0Sstevel@tonic-gateor Perlish C<GV *>s). Where the above macros take C<int>, a similar 1313*0Sstevel@tonic-gatefunction takes C<int *>. 1314*0Sstevel@tonic-gate 1315*0Sstevel@tonic-gate=over 4 1316*0Sstevel@tonic-gate 1317*0Sstevel@tonic-gate=item C<SV* save_scalar(GV *gv)> 1318*0Sstevel@tonic-gate 1319*0Sstevel@tonic-gateEquivalent to Perl code C<local $gv>. 1320*0Sstevel@tonic-gate 1321*0Sstevel@tonic-gate=item C<AV* save_ary(GV *gv)> 1322*0Sstevel@tonic-gate 1323*0Sstevel@tonic-gate=item C<HV* save_hash(GV *gv)> 1324*0Sstevel@tonic-gate 1325*0Sstevel@tonic-gateSimilar to C<save_scalar>, but localize C<@gv> and C<%gv>. 1326*0Sstevel@tonic-gate 1327*0Sstevel@tonic-gate=item C<void save_item(SV *item)> 1328*0Sstevel@tonic-gate 1329*0Sstevel@tonic-gateDuplicates the current value of C<SV>, on the exit from the current 1330*0Sstevel@tonic-gateC<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> 1331*0Sstevel@tonic-gateusing the stored value. 1332*0Sstevel@tonic-gate 1333*0Sstevel@tonic-gate=item C<void save_list(SV **sarg, I32 maxsarg)> 1334*0Sstevel@tonic-gate 1335*0Sstevel@tonic-gateA variant of C<save_item> which takes multiple arguments via an array 1336*0Sstevel@tonic-gateC<sarg> of C<SV*> of length C<maxsarg>. 1337*0Sstevel@tonic-gate 1338*0Sstevel@tonic-gate=item C<SV* save_svref(SV **sptr)> 1339*0Sstevel@tonic-gate 1340*0Sstevel@tonic-gateSimilar to C<save_scalar>, but will reinstate an C<SV *>. 1341*0Sstevel@tonic-gate 1342*0Sstevel@tonic-gate=item C<void save_aptr(AV **aptr)> 1343*0Sstevel@tonic-gate 1344*0Sstevel@tonic-gate=item C<void save_hptr(HV **hptr)> 1345*0Sstevel@tonic-gate 1346*0Sstevel@tonic-gateSimilar to C<save_svref>, but localize C<AV *> and C<HV *>. 1347*0Sstevel@tonic-gate 1348*0Sstevel@tonic-gate=back 1349*0Sstevel@tonic-gate 1350*0Sstevel@tonic-gateThe C<Alias> module implements localization of the basic types within the 1351*0Sstevel@tonic-gateI<caller's scope>. People who are interested in how to localize things in 1352*0Sstevel@tonic-gatethe containing scope should take a look there too. 1353*0Sstevel@tonic-gate 1354*0Sstevel@tonic-gate=head1 Subroutines 1355*0Sstevel@tonic-gate 1356*0Sstevel@tonic-gate=head2 XSUBs and the Argument Stack 1357*0Sstevel@tonic-gate 1358*0Sstevel@tonic-gateThe XSUB mechanism is a simple way for Perl programs to access C subroutines. 1359*0Sstevel@tonic-gateAn XSUB routine will have a stack that contains the arguments from the Perl 1360*0Sstevel@tonic-gateprogram, and a way to map from the Perl data structures to a C equivalent. 1361*0Sstevel@tonic-gate 1362*0Sstevel@tonic-gateThe stack arguments are accessible through the C<ST(n)> macro, which returns 1363*0Sstevel@tonic-gatethe C<n>'th stack argument. Argument 0 is the first argument passed in the 1364*0Sstevel@tonic-gatePerl subroutine call. These arguments are C<SV*>, and can be used anywhere 1365*0Sstevel@tonic-gatean C<SV*> is used. 1366*0Sstevel@tonic-gate 1367*0Sstevel@tonic-gateMost of the time, output from the C routine can be handled through use of 1368*0Sstevel@tonic-gatethe RETVAL and OUTPUT directives. However, there are some cases where the 1369*0Sstevel@tonic-gateargument stack is not already long enough to handle all the return values. 1370*0Sstevel@tonic-gateAn example is the POSIX tzname() call, which takes no arguments, but returns 1371*0Sstevel@tonic-gatetwo, the local time zone's standard and summer time abbreviations. 1372*0Sstevel@tonic-gate 1373*0Sstevel@tonic-gateTo handle this situation, the PPCODE directive is used and the stack is 1374*0Sstevel@tonic-gateextended using the macro: 1375*0Sstevel@tonic-gate 1376*0Sstevel@tonic-gate EXTEND(SP, num); 1377*0Sstevel@tonic-gate 1378*0Sstevel@tonic-gatewhere C<SP> is the macro that represents the local copy of the stack pointer, 1379*0Sstevel@tonic-gateand C<num> is the number of elements the stack should be extended by. 1380*0Sstevel@tonic-gate 1381*0Sstevel@tonic-gateNow that there is room on the stack, values can be pushed on it using C<PUSHs> 1382*0Sstevel@tonic-gatemacro. The pushed values will often need to be "mortal" (See 1383*0Sstevel@tonic-gateL</Reference Counts and Mortality>). 1384*0Sstevel@tonic-gate 1385*0Sstevel@tonic-gate PUSHs(sv_2mortal(newSViv(an_integer))) 1386*0Sstevel@tonic-gate PUSHs(sv_2mortal(newSVpv("Some String",0))) 1387*0Sstevel@tonic-gate PUSHs(sv_2mortal(newSVnv(3.141592))) 1388*0Sstevel@tonic-gate 1389*0Sstevel@tonic-gateAnd now the Perl program calling C<tzname>, the two values will be assigned 1390*0Sstevel@tonic-gateas in: 1391*0Sstevel@tonic-gate 1392*0Sstevel@tonic-gate ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 1393*0Sstevel@tonic-gate 1394*0Sstevel@tonic-gateAn alternate (and possibly simpler) method to pushing values on the stack is 1395*0Sstevel@tonic-gateto use the macro: 1396*0Sstevel@tonic-gate 1397*0Sstevel@tonic-gate XPUSHs(SV*) 1398*0Sstevel@tonic-gate 1399*0Sstevel@tonic-gateThis macro automatically adjust the stack for you, if needed. Thus, you 1400*0Sstevel@tonic-gatedo not need to call C<EXTEND> to extend the stack. 1401*0Sstevel@tonic-gate 1402*0Sstevel@tonic-gateDespite their suggestions in earlier versions of this document the macros 1403*0Sstevel@tonic-gateC<PUSHi>, C<PUSHn> and C<PUSHp> are I<not> suited to XSUBs which return 1404*0Sstevel@tonic-gatemultiple results, see L</Putting a C value on Perl stack>. 1405*0Sstevel@tonic-gate 1406*0Sstevel@tonic-gateFor more information, consult L<perlxs> and L<perlxstut>. 1407*0Sstevel@tonic-gate 1408*0Sstevel@tonic-gate=head2 Calling Perl Routines from within C Programs 1409*0Sstevel@tonic-gate 1410*0Sstevel@tonic-gateThere are four routines that can be used to call a Perl subroutine from 1411*0Sstevel@tonic-gatewithin a C program. These four are: 1412*0Sstevel@tonic-gate 1413*0Sstevel@tonic-gate I32 call_sv(SV*, I32); 1414*0Sstevel@tonic-gate I32 call_pv(const char*, I32); 1415*0Sstevel@tonic-gate I32 call_method(const char*, I32); 1416*0Sstevel@tonic-gate I32 call_argv(const char*, I32, register char**); 1417*0Sstevel@tonic-gate 1418*0Sstevel@tonic-gateThe routine most often used is C<call_sv>. The C<SV*> argument 1419*0Sstevel@tonic-gatecontains either the name of the Perl subroutine to be called, or a 1420*0Sstevel@tonic-gatereference to the subroutine. The second argument consists of flags 1421*0Sstevel@tonic-gatethat control the context in which the subroutine is called, whether 1422*0Sstevel@tonic-gateor not the subroutine is being passed arguments, how errors should be 1423*0Sstevel@tonic-gatetrapped, and how to treat return values. 1424*0Sstevel@tonic-gate 1425*0Sstevel@tonic-gateAll four routines return the number of arguments that the subroutine returned 1426*0Sstevel@tonic-gateon the Perl stack. 1427*0Sstevel@tonic-gate 1428*0Sstevel@tonic-gateThese routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 1429*0Sstevel@tonic-gatebut those names are now deprecated; macros of the same name are provided for 1430*0Sstevel@tonic-gatecompatibility. 1431*0Sstevel@tonic-gate 1432*0Sstevel@tonic-gateWhen using any of these routines (except C<call_argv>), the programmer 1433*0Sstevel@tonic-gatemust manipulate the Perl stack. These include the following macros and 1434*0Sstevel@tonic-gatefunctions: 1435*0Sstevel@tonic-gate 1436*0Sstevel@tonic-gate dSP 1437*0Sstevel@tonic-gate SP 1438*0Sstevel@tonic-gate PUSHMARK() 1439*0Sstevel@tonic-gate PUTBACK 1440*0Sstevel@tonic-gate SPAGAIN 1441*0Sstevel@tonic-gate ENTER 1442*0Sstevel@tonic-gate SAVETMPS 1443*0Sstevel@tonic-gate FREETMPS 1444*0Sstevel@tonic-gate LEAVE 1445*0Sstevel@tonic-gate XPUSH*() 1446*0Sstevel@tonic-gate POP*() 1447*0Sstevel@tonic-gate 1448*0Sstevel@tonic-gateFor a detailed description of calling conventions from C to Perl, 1449*0Sstevel@tonic-gateconsult L<perlcall>. 1450*0Sstevel@tonic-gate 1451*0Sstevel@tonic-gate=head2 Memory Allocation 1452*0Sstevel@tonic-gate 1453*0Sstevel@tonic-gate=head3 Allocation 1454*0Sstevel@tonic-gate 1455*0Sstevel@tonic-gateAll memory meant to be used with the Perl API functions should be manipulated 1456*0Sstevel@tonic-gateusing the macros described in this section. The macros provide the necessary 1457*0Sstevel@tonic-gatetransparency between differences in the actual malloc implementation that is 1458*0Sstevel@tonic-gateused within perl. 1459*0Sstevel@tonic-gate 1460*0Sstevel@tonic-gateIt is suggested that you enable the version of malloc that is distributed 1461*0Sstevel@tonic-gatewith Perl. It keeps pools of various sizes of unallocated memory in 1462*0Sstevel@tonic-gateorder to satisfy allocation requests more quickly. However, on some 1463*0Sstevel@tonic-gateplatforms, it may cause spurious malloc or free errors. 1464*0Sstevel@tonic-gate 1465*0Sstevel@tonic-gateThe following three macros are used to initially allocate memory : 1466*0Sstevel@tonic-gate 1467*0Sstevel@tonic-gate New(x, pointer, number, type); 1468*0Sstevel@tonic-gate Newc(x, pointer, number, type, cast); 1469*0Sstevel@tonic-gate Newz(x, pointer, number, type); 1470*0Sstevel@tonic-gate 1471*0Sstevel@tonic-gateThe first argument C<x> was a "magic cookie" that was used to keep track 1472*0Sstevel@tonic-gateof who called the macro, to help when debugging memory problems. However, 1473*0Sstevel@tonic-gatethe current code makes no use of this feature (most Perl developers now 1474*0Sstevel@tonic-gateuse run-time memory checkers), so this argument can be any number. 1475*0Sstevel@tonic-gate 1476*0Sstevel@tonic-gateThe second argument C<pointer> should be the name of a variable that will 1477*0Sstevel@tonic-gatepoint to the newly allocated memory. 1478*0Sstevel@tonic-gate 1479*0Sstevel@tonic-gateThe third and fourth arguments C<number> and C<type> specify how many of 1480*0Sstevel@tonic-gatethe specified type of data structure should be allocated. The argument 1481*0Sstevel@tonic-gateC<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>, 1482*0Sstevel@tonic-gateshould be used if the C<pointer> argument is different from the C<type> 1483*0Sstevel@tonic-gateargument. 1484*0Sstevel@tonic-gate 1485*0Sstevel@tonic-gateUnlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero> 1486*0Sstevel@tonic-gateto zero out all the newly allocated memory. 1487*0Sstevel@tonic-gate 1488*0Sstevel@tonic-gate=head3 Reallocation 1489*0Sstevel@tonic-gate 1490*0Sstevel@tonic-gate Renew(pointer, number, type); 1491*0Sstevel@tonic-gate Renewc(pointer, number, type, cast); 1492*0Sstevel@tonic-gate Safefree(pointer) 1493*0Sstevel@tonic-gate 1494*0Sstevel@tonic-gateThese three macros are used to change a memory buffer size or to free a 1495*0Sstevel@tonic-gatepiece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 1496*0Sstevel@tonic-gatematch those of C<New> and C<Newc> with the exception of not needing the 1497*0Sstevel@tonic-gate"magic cookie" argument. 1498*0Sstevel@tonic-gate 1499*0Sstevel@tonic-gate=head3 Moving 1500*0Sstevel@tonic-gate 1501*0Sstevel@tonic-gate Move(source, dest, number, type); 1502*0Sstevel@tonic-gate Copy(source, dest, number, type); 1503*0Sstevel@tonic-gate Zero(dest, number, type); 1504*0Sstevel@tonic-gate 1505*0Sstevel@tonic-gateThese three macros are used to move, copy, or zero out previously allocated 1506*0Sstevel@tonic-gatememory. The C<source> and C<dest> arguments point to the source and 1507*0Sstevel@tonic-gatedestination starting points. Perl will move, copy, or zero out C<number> 1508*0Sstevel@tonic-gateinstances of the size of the C<type> data structure (using the C<sizeof> 1509*0Sstevel@tonic-gatefunction). 1510*0Sstevel@tonic-gate 1511*0Sstevel@tonic-gate=head2 PerlIO 1512*0Sstevel@tonic-gate 1513*0Sstevel@tonic-gateThe most recent development releases of Perl has been experimenting with 1514*0Sstevel@tonic-gateremoving Perl's dependency on the "normal" standard I/O suite and allowing 1515*0Sstevel@tonic-gateother stdio implementations to be used. This involves creating a new 1516*0Sstevel@tonic-gateabstraction layer that then calls whichever implementation of stdio Perl 1517*0Sstevel@tonic-gatewas compiled with. All XSUBs should now use the functions in the PerlIO 1518*0Sstevel@tonic-gateabstraction layer and not make any assumptions about what kind of stdio 1519*0Sstevel@tonic-gateis being used. 1520*0Sstevel@tonic-gate 1521*0Sstevel@tonic-gateFor a complete description of the PerlIO abstraction, consult L<perlapio>. 1522*0Sstevel@tonic-gate 1523*0Sstevel@tonic-gate=head2 Putting a C value on Perl stack 1524*0Sstevel@tonic-gate 1525*0Sstevel@tonic-gateA lot of opcodes (this is an elementary operation in the internal perl 1526*0Sstevel@tonic-gatestack machine) put an SV* on the stack. However, as an optimization 1527*0Sstevel@tonic-gatethe corresponding SV is (usually) not recreated each time. The opcodes 1528*0Sstevel@tonic-gatereuse specially assigned SVs (I<target>s) which are (as a corollary) 1529*0Sstevel@tonic-gatenot constantly freed/created. 1530*0Sstevel@tonic-gate 1531*0Sstevel@tonic-gateEach of the targets is created only once (but see 1532*0Sstevel@tonic-gateL<Scratchpads and recursion> below), and when an opcode needs to put 1533*0Sstevel@tonic-gatean integer, a double, or a string on stack, it just sets the 1534*0Sstevel@tonic-gatecorresponding parts of its I<target> and puts the I<target> on stack. 1535*0Sstevel@tonic-gate 1536*0Sstevel@tonic-gateThe macro to put this target on stack is C<PUSHTARG>, and it is 1537*0Sstevel@tonic-gatedirectly used in some opcodes, as well as indirectly in zillions of 1538*0Sstevel@tonic-gateothers, which use it via C<(X)PUSH[pni]>. 1539*0Sstevel@tonic-gate 1540*0Sstevel@tonic-gateBecause the target is reused, you must be careful when pushing multiple 1541*0Sstevel@tonic-gatevalues on the stack. The following code will not do what you think: 1542*0Sstevel@tonic-gate 1543*0Sstevel@tonic-gate XPUSHi(10); 1544*0Sstevel@tonic-gate XPUSHi(20); 1545*0Sstevel@tonic-gate 1546*0Sstevel@tonic-gateThis translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 1547*0Sstevel@tonic-gatethe stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 1548*0Sstevel@tonic-gateAt the end of the operation, the stack does not contain the values 10 1549*0Sstevel@tonic-gateand 20, but actually contains two pointers to C<TARG>, which we have set 1550*0Sstevel@tonic-gateto 20. If you need to push multiple different values, use C<XPUSHs>, 1551*0Sstevel@tonic-gatewhich bypasses C<TARG>. 1552*0Sstevel@tonic-gate 1553*0Sstevel@tonic-gateOn a related note, if you do use C<(X)PUSH[npi]>, then you're going to 1554*0Sstevel@tonic-gateneed a C<dTARG> in your variable declarations so that the C<*PUSH*> 1555*0Sstevel@tonic-gatemacros can make use of the local variable C<TARG>. 1556*0Sstevel@tonic-gate 1557*0Sstevel@tonic-gate=head2 Scratchpads 1558*0Sstevel@tonic-gate 1559*0Sstevel@tonic-gateThe question remains on when the SVs which are I<target>s for opcodes 1560*0Sstevel@tonic-gateare created. The answer is that they are created when the current unit -- 1561*0Sstevel@tonic-gatea subroutine or a file (for opcodes for statements outside of 1562*0Sstevel@tonic-gatesubroutines) -- is compiled. During this time a special anonymous Perl 1563*0Sstevel@tonic-gatearray is created, which is called a scratchpad for the current 1564*0Sstevel@tonic-gateunit. 1565*0Sstevel@tonic-gate 1566*0Sstevel@tonic-gateA scratchpad keeps SVs which are lexicals for the current unit and are 1567*0Sstevel@tonic-gatetargets for opcodes. One can deduce that an SV lives on a scratchpad 1568*0Sstevel@tonic-gateby looking on its flags: lexicals have C<SVs_PADMY> set, and 1569*0Sstevel@tonic-gateI<target>s have C<SVs_PADTMP> set. 1570*0Sstevel@tonic-gate 1571*0Sstevel@tonic-gateThe correspondence between OPs and I<target>s is not 1-to-1. Different 1572*0Sstevel@tonic-gateOPs in the compile tree of the unit can use the same target, if this 1573*0Sstevel@tonic-gatewould not conflict with the expected life of the temporary. 1574*0Sstevel@tonic-gate 1575*0Sstevel@tonic-gate=head2 Scratchpads and recursion 1576*0Sstevel@tonic-gate 1577*0Sstevel@tonic-gateIn fact it is not 100% true that a compiled unit contains a pointer to 1578*0Sstevel@tonic-gatethe scratchpad AV. In fact it contains a pointer to an AV of 1579*0Sstevel@tonic-gate(initially) one element, and this element is the scratchpad AV. Why do 1580*0Sstevel@tonic-gatewe need an extra level of indirection? 1581*0Sstevel@tonic-gate 1582*0Sstevel@tonic-gateThe answer is B<recursion>, and maybe B<threads>. Both 1583*0Sstevel@tonic-gatethese can create several execution pointers going into the same 1584*0Sstevel@tonic-gatesubroutine. For the subroutine-child not write over the temporaries 1585*0Sstevel@tonic-gatefor the subroutine-parent (lifespan of which covers the call to the 1586*0Sstevel@tonic-gatechild), the parent and the child should have different 1587*0Sstevel@tonic-gatescratchpads. (I<And> the lexicals should be separate anyway!) 1588*0Sstevel@tonic-gate 1589*0Sstevel@tonic-gateSo each subroutine is born with an array of scratchpads (of length 1). 1590*0Sstevel@tonic-gateOn each entry to the subroutine it is checked that the current 1591*0Sstevel@tonic-gatedepth of the recursion is not more than the length of this array, and 1592*0Sstevel@tonic-gateif it is, new scratchpad is created and pushed into the array. 1593*0Sstevel@tonic-gate 1594*0Sstevel@tonic-gateThe I<target>s on this scratchpad are C<undef>s, but they are already 1595*0Sstevel@tonic-gatemarked with correct flags. 1596*0Sstevel@tonic-gate 1597*0Sstevel@tonic-gate=head1 Compiled code 1598*0Sstevel@tonic-gate 1599*0Sstevel@tonic-gate=head2 Code tree 1600*0Sstevel@tonic-gate 1601*0Sstevel@tonic-gateHere we describe the internal form your code is converted to by 1602*0Sstevel@tonic-gatePerl. Start with a simple example: 1603*0Sstevel@tonic-gate 1604*0Sstevel@tonic-gate $a = $b + $c; 1605*0Sstevel@tonic-gate 1606*0Sstevel@tonic-gateThis is converted to a tree similar to this one: 1607*0Sstevel@tonic-gate 1608*0Sstevel@tonic-gate assign-to 1609*0Sstevel@tonic-gate / \ 1610*0Sstevel@tonic-gate + $a 1611*0Sstevel@tonic-gate / \ 1612*0Sstevel@tonic-gate $b $c 1613*0Sstevel@tonic-gate 1614*0Sstevel@tonic-gate(but slightly more complicated). This tree reflects the way Perl 1615*0Sstevel@tonic-gateparsed your code, but has nothing to do with the execution order. 1616*0Sstevel@tonic-gateThere is an additional "thread" going through the nodes of the tree 1617*0Sstevel@tonic-gatewhich shows the order of execution of the nodes. In our simplified 1618*0Sstevel@tonic-gateexample above it looks like: 1619*0Sstevel@tonic-gate 1620*0Sstevel@tonic-gate $b ---> $c ---> + ---> $a ---> assign-to 1621*0Sstevel@tonic-gate 1622*0Sstevel@tonic-gateBut with the actual compile tree for C<$a = $b + $c> it is different: 1623*0Sstevel@tonic-gatesome nodes I<optimized away>. As a corollary, though the actual tree 1624*0Sstevel@tonic-gatecontains more nodes than our simplified example, the execution order 1625*0Sstevel@tonic-gateis the same as in our example. 1626*0Sstevel@tonic-gate 1627*0Sstevel@tonic-gate=head2 Examining the tree 1628*0Sstevel@tonic-gate 1629*0Sstevel@tonic-gateIf you have your perl compiled for debugging (usually done with 1630*0Sstevel@tonic-gateC<-DDEBUGGING> on the C<Configure> command line), you may examine the 1631*0Sstevel@tonic-gatecompiled tree by specifying C<-Dx> on the Perl command line. The 1632*0Sstevel@tonic-gateoutput takes several lines per node, and for C<$b+$c> it looks like 1633*0Sstevel@tonic-gatethis: 1634*0Sstevel@tonic-gate 1635*0Sstevel@tonic-gate 5 TYPE = add ===> 6 1636*0Sstevel@tonic-gate TARG = 1 1637*0Sstevel@tonic-gate FLAGS = (SCALAR,KIDS) 1638*0Sstevel@tonic-gate { 1639*0Sstevel@tonic-gate TYPE = null ===> (4) 1640*0Sstevel@tonic-gate (was rv2sv) 1641*0Sstevel@tonic-gate FLAGS = (SCALAR,KIDS) 1642*0Sstevel@tonic-gate { 1643*0Sstevel@tonic-gate 3 TYPE = gvsv ===> 4 1644*0Sstevel@tonic-gate FLAGS = (SCALAR) 1645*0Sstevel@tonic-gate GV = main::b 1646*0Sstevel@tonic-gate } 1647*0Sstevel@tonic-gate } 1648*0Sstevel@tonic-gate { 1649*0Sstevel@tonic-gate TYPE = null ===> (5) 1650*0Sstevel@tonic-gate (was rv2sv) 1651*0Sstevel@tonic-gate FLAGS = (SCALAR,KIDS) 1652*0Sstevel@tonic-gate { 1653*0Sstevel@tonic-gate 4 TYPE = gvsv ===> 5 1654*0Sstevel@tonic-gate FLAGS = (SCALAR) 1655*0Sstevel@tonic-gate GV = main::c 1656*0Sstevel@tonic-gate } 1657*0Sstevel@tonic-gate } 1658*0Sstevel@tonic-gate 1659*0Sstevel@tonic-gateThis tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 1660*0Sstevel@tonic-gatenot optimized away (one per number in the left column). The immediate 1661*0Sstevel@tonic-gatechildren of the given node correspond to C<{}> pairs on the same level 1662*0Sstevel@tonic-gateof indentation, thus this listing corresponds to the tree: 1663*0Sstevel@tonic-gate 1664*0Sstevel@tonic-gate add 1665*0Sstevel@tonic-gate / \ 1666*0Sstevel@tonic-gate null null 1667*0Sstevel@tonic-gate | | 1668*0Sstevel@tonic-gate gvsv gvsv 1669*0Sstevel@tonic-gate 1670*0Sstevel@tonic-gateThe execution order is indicated by C<===E<gt>> marks, thus it is C<3 1671*0Sstevel@tonic-gate4 5 6> (node C<6> is not included into above listing), i.e., 1672*0Sstevel@tonic-gateC<gvsv gvsv add whatever>. 1673*0Sstevel@tonic-gate 1674*0Sstevel@tonic-gateEach of these nodes represents an op, a fundamental operation inside the 1675*0Sstevel@tonic-gatePerl core. The code which implements each operation can be found in the 1676*0Sstevel@tonic-gateF<pp*.c> files; the function which implements the op with type C<gvsv> 1677*0Sstevel@tonic-gateis C<pp_gvsv>, and so on. As the tree above shows, different ops have 1678*0Sstevel@tonic-gatedifferent numbers of children: C<add> is a binary operator, as one would 1679*0Sstevel@tonic-gateexpect, and so has two children. To accommodate the various different 1680*0Sstevel@tonic-gatenumbers of children, there are various types of op data structure, and 1681*0Sstevel@tonic-gatethey link together in different ways. 1682*0Sstevel@tonic-gate 1683*0Sstevel@tonic-gateThe simplest type of op structure is C<OP>: this has no children. Unary 1684*0Sstevel@tonic-gateoperators, C<UNOP>s, have one child, and this is pointed to by the 1685*0Sstevel@tonic-gateC<op_first> field. Binary operators (C<BINOP>s) have not only an 1686*0Sstevel@tonic-gateC<op_first> field but also an C<op_last> field. The most complex type of 1687*0Sstevel@tonic-gateop is a C<LISTOP>, which has any number of children. In this case, the 1688*0Sstevel@tonic-gatefirst child is pointed to by C<op_first> and the last child by 1689*0Sstevel@tonic-gateC<op_last>. The children in between can be found by iteratively 1690*0Sstevel@tonic-gatefollowing the C<op_sibling> pointer from the first child to the last. 1691*0Sstevel@tonic-gate 1692*0Sstevel@tonic-gateThere are also two other op types: a C<PMOP> holds a regular expression, 1693*0Sstevel@tonic-gateand has no children, and a C<LOOP> may or may not have children. If the 1694*0Sstevel@tonic-gateC<op_children> field is non-zero, it behaves like a C<LISTOP>. To 1695*0Sstevel@tonic-gatecomplicate matters, if a C<UNOP> is actually a C<null> op after 1696*0Sstevel@tonic-gateoptimization (see L</Compile pass 2: context propagation>) it will still 1697*0Sstevel@tonic-gatehave children in accordance with its former type. 1698*0Sstevel@tonic-gate 1699*0Sstevel@tonic-gateAnother way to examine the tree is to use a compiler back-end module, such 1700*0Sstevel@tonic-gateas L<B::Concise>. 1701*0Sstevel@tonic-gate 1702*0Sstevel@tonic-gate=head2 Compile pass 1: check routines 1703*0Sstevel@tonic-gate 1704*0Sstevel@tonic-gateThe tree is created by the compiler while I<yacc> code feeds it 1705*0Sstevel@tonic-gatethe constructions it recognizes. Since I<yacc> works bottom-up, so does 1706*0Sstevel@tonic-gatethe first pass of perl compilation. 1707*0Sstevel@tonic-gate 1708*0Sstevel@tonic-gateWhat makes this pass interesting for perl developers is that some 1709*0Sstevel@tonic-gateoptimization may be performed on this pass. This is optimization by 1710*0Sstevel@tonic-gateso-called "check routines". The correspondence between node names 1711*0Sstevel@tonic-gateand corresponding check routines is described in F<opcode.pl> (do not 1712*0Sstevel@tonic-gateforget to run C<make regen_headers> if you modify this file). 1713*0Sstevel@tonic-gate 1714*0Sstevel@tonic-gateA check routine is called when the node is fully constructed except 1715*0Sstevel@tonic-gatefor the execution-order thread. Since at this time there are no 1716*0Sstevel@tonic-gateback-links to the currently constructed node, one can do most any 1717*0Sstevel@tonic-gateoperation to the top-level node, including freeing it and/or creating 1718*0Sstevel@tonic-gatenew nodes above/below it. 1719*0Sstevel@tonic-gate 1720*0Sstevel@tonic-gateThe check routine returns the node which should be inserted into the 1721*0Sstevel@tonic-gatetree (if the top-level node was not modified, check routine returns 1722*0Sstevel@tonic-gateits argument). 1723*0Sstevel@tonic-gate 1724*0Sstevel@tonic-gateBy convention, check routines have names C<ck_*>. They are usually 1725*0Sstevel@tonic-gatecalled from C<new*OP> subroutines (or C<convert>) (which in turn are 1726*0Sstevel@tonic-gatecalled from F<perly.y>). 1727*0Sstevel@tonic-gate 1728*0Sstevel@tonic-gate=head2 Compile pass 1a: constant folding 1729*0Sstevel@tonic-gate 1730*0Sstevel@tonic-gateImmediately after the check routine is called the returned node is 1731*0Sstevel@tonic-gatechecked for being compile-time executable. If it is (the value is 1732*0Sstevel@tonic-gatejudged to be constant) it is immediately executed, and a I<constant> 1733*0Sstevel@tonic-gatenode with the "return value" of the corresponding subtree is 1734*0Sstevel@tonic-gatesubstituted instead. The subtree is deleted. 1735*0Sstevel@tonic-gate 1736*0Sstevel@tonic-gateIf constant folding was not performed, the execution-order thread is 1737*0Sstevel@tonic-gatecreated. 1738*0Sstevel@tonic-gate 1739*0Sstevel@tonic-gate=head2 Compile pass 2: context propagation 1740*0Sstevel@tonic-gate 1741*0Sstevel@tonic-gateWhen a context for a part of compile tree is known, it is propagated 1742*0Sstevel@tonic-gatedown through the tree. At this time the context can have 5 values 1743*0Sstevel@tonic-gate(instead of 2 for runtime context): void, boolean, scalar, list, and 1744*0Sstevel@tonic-gatelvalue. In contrast with the pass 1 this pass is processed from top 1745*0Sstevel@tonic-gateto bottom: a node's context determines the context for its children. 1746*0Sstevel@tonic-gate 1747*0Sstevel@tonic-gateAdditional context-dependent optimizations are performed at this time. 1748*0Sstevel@tonic-gateSince at this moment the compile tree contains back-references (via 1749*0Sstevel@tonic-gate"thread" pointers), nodes cannot be free()d now. To allow 1750*0Sstevel@tonic-gateoptimized-away nodes at this stage, such nodes are null()ified instead 1751*0Sstevel@tonic-gateof free()ing (i.e. their type is changed to OP_NULL). 1752*0Sstevel@tonic-gate 1753*0Sstevel@tonic-gate=head2 Compile pass 3: peephole optimization 1754*0Sstevel@tonic-gate 1755*0Sstevel@tonic-gateAfter the compile tree for a subroutine (or for an C<eval> or a file) 1756*0Sstevel@tonic-gateis created, an additional pass over the code is performed. This pass 1757*0Sstevel@tonic-gateis neither top-down or bottom-up, but in the execution order (with 1758*0Sstevel@tonic-gateadditional complications for conditionals). These optimizations are 1759*0Sstevel@tonic-gatedone in the subroutine peep(). Optimizations performed at this stage 1760*0Sstevel@tonic-gateare subject to the same restrictions as in the pass 2. 1761*0Sstevel@tonic-gate 1762*0Sstevel@tonic-gate=head2 Pluggable runops 1763*0Sstevel@tonic-gate 1764*0Sstevel@tonic-gateThe compile tree is executed in a runops function. There are two runops 1765*0Sstevel@tonic-gatefunctions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 1766*0Sstevel@tonic-gatewith DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 1767*0Sstevel@tonic-gatecontrol over the execution of the compile tree it is possible to provide 1768*0Sstevel@tonic-gateyour own runops function. 1769*0Sstevel@tonic-gate 1770*0Sstevel@tonic-gateIt's probably best to copy one of the existing runops functions and 1771*0Sstevel@tonic-gatechange it to suit your needs. Then, in the BOOT section of your XS 1772*0Sstevel@tonic-gatefile, add the line: 1773*0Sstevel@tonic-gate 1774*0Sstevel@tonic-gate PL_runops = my_runops; 1775*0Sstevel@tonic-gate 1776*0Sstevel@tonic-gateThis function should be as efficient as possible to keep your programs 1777*0Sstevel@tonic-gaterunning as fast as possible. 1778*0Sstevel@tonic-gate 1779*0Sstevel@tonic-gate=head1 Examining internal data structures with the C<dump> functions 1780*0Sstevel@tonic-gate 1781*0Sstevel@tonic-gateTo aid debugging, the source file F<dump.c> contains a number of 1782*0Sstevel@tonic-gatefunctions which produce formatted output of internal data structures. 1783*0Sstevel@tonic-gate 1784*0Sstevel@tonic-gateThe most commonly used of these functions is C<Perl_sv_dump>; it's used 1785*0Sstevel@tonic-gatefor dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 1786*0Sstevel@tonic-gateC<sv_dump> to produce debugging output from Perl-space, so users of that 1787*0Sstevel@tonic-gatemodule should already be familiar with its format. 1788*0Sstevel@tonic-gate 1789*0Sstevel@tonic-gateC<Perl_op_dump> can be used to dump an C<OP> structure or any of its 1790*0Sstevel@tonic-gatederivatives, and produces output similar to C<perl -Dx>; in fact, 1791*0Sstevel@tonic-gateC<Perl_dump_eval> will dump the main root of the code being evaluated, 1792*0Sstevel@tonic-gateexactly like C<-Dx>. 1793*0Sstevel@tonic-gate 1794*0Sstevel@tonic-gateOther useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 1795*0Sstevel@tonic-gateop tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 1796*0Sstevel@tonic-gatesubroutines in a package like so: (Thankfully, these are all xsubs, so 1797*0Sstevel@tonic-gatethere is no op tree) 1798*0Sstevel@tonic-gate 1799*0Sstevel@tonic-gate (gdb) print Perl_dump_packsubs(PL_defstash) 1800*0Sstevel@tonic-gate 1801*0Sstevel@tonic-gate SUB attributes::bootstrap = (xsub 0x811fedc 0) 1802*0Sstevel@tonic-gate 1803*0Sstevel@tonic-gate SUB UNIVERSAL::can = (xsub 0x811f50c 0) 1804*0Sstevel@tonic-gate 1805*0Sstevel@tonic-gate SUB UNIVERSAL::isa = (xsub 0x811f304 0) 1806*0Sstevel@tonic-gate 1807*0Sstevel@tonic-gate SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 1808*0Sstevel@tonic-gate 1809*0Sstevel@tonic-gate SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 1810*0Sstevel@tonic-gate 1811*0Sstevel@tonic-gateand C<Perl_dump_all>, which dumps all the subroutines in the stash and 1812*0Sstevel@tonic-gatethe op tree of the main root. 1813*0Sstevel@tonic-gate 1814*0Sstevel@tonic-gate=head1 How multiple interpreters and concurrency are supported 1815*0Sstevel@tonic-gate 1816*0Sstevel@tonic-gate=head2 Background and PERL_IMPLICIT_CONTEXT 1817*0Sstevel@tonic-gate 1818*0Sstevel@tonic-gateThe Perl interpreter can be regarded as a closed box: it has an API 1819*0Sstevel@tonic-gatefor feeding it code or otherwise making it do things, but it also has 1820*0Sstevel@tonic-gatefunctions for its own use. This smells a lot like an object, and 1821*0Sstevel@tonic-gatethere are ways for you to build Perl so that you can have multiple 1822*0Sstevel@tonic-gateinterpreters, with one interpreter represented either as a C structure, 1823*0Sstevel@tonic-gateor inside a thread-specific structure. These structures contain all 1824*0Sstevel@tonic-gatethe context, the state of that interpreter. 1825*0Sstevel@tonic-gate 1826*0Sstevel@tonic-gateTwo macros control the major Perl build flavors: MULTIPLICITY and 1827*0Sstevel@tonic-gateUSE_5005THREADS. The MULTIPLICITY build has a C structure 1828*0Sstevel@tonic-gatethat packages all the interpreter state, and there is a similar thread-specific 1829*0Sstevel@tonic-gatedata structure under USE_5005THREADS. In both cases, 1830*0Sstevel@tonic-gatePERL_IMPLICIT_CONTEXT is also normally defined, and enables the 1831*0Sstevel@tonic-gatesupport for passing in a "hidden" first argument that represents all three 1832*0Sstevel@tonic-gatedata structures. 1833*0Sstevel@tonic-gate 1834*0Sstevel@tonic-gateAll this obviously requires a way for the Perl internal functions to be 1835*0Sstevel@tonic-gateeither subroutines taking some kind of structure as the first 1836*0Sstevel@tonic-gateargument, or subroutines taking nothing as the first argument. To 1837*0Sstevel@tonic-gateenable these two very different ways of building the interpreter, 1838*0Sstevel@tonic-gatethe Perl source (as it does in so many other situations) makes heavy 1839*0Sstevel@tonic-gateuse of macros and subroutine naming conventions. 1840*0Sstevel@tonic-gate 1841*0Sstevel@tonic-gateFirst problem: deciding which functions will be public API functions and 1842*0Sstevel@tonic-gatewhich will be private. All functions whose names begin C<S_> are private 1843*0Sstevel@tonic-gate(think "S" for "secret" or "static"). All other functions begin with 1844*0Sstevel@tonic-gate"Perl_", but just because a function begins with "Perl_" does not mean it is 1845*0Sstevel@tonic-gatepart of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a 1846*0Sstevel@tonic-gatefunction is part of the API is to find its entry in L<perlapi>. 1847*0Sstevel@tonic-gateIf it exists in L<perlapi>, it's part of the API. If it doesn't, and you 1848*0Sstevel@tonic-gatethink it should be (i.e., you need it for your extension), send mail via 1849*0Sstevel@tonic-gateL<perlbug> explaining why you think it should be. 1850*0Sstevel@tonic-gate 1851*0Sstevel@tonic-gateSecond problem: there must be a syntax so that the same subroutine 1852*0Sstevel@tonic-gatedeclarations and calls can pass a structure as their first argument, 1853*0Sstevel@tonic-gateor pass nothing. To solve this, the subroutines are named and 1854*0Sstevel@tonic-gatedeclared in a particular way. Here's a typical start of a static 1855*0Sstevel@tonic-gatefunction used within the Perl guts: 1856*0Sstevel@tonic-gate 1857*0Sstevel@tonic-gate STATIC void 1858*0Sstevel@tonic-gate S_incline(pTHX_ char *s) 1859*0Sstevel@tonic-gate 1860*0Sstevel@tonic-gateSTATIC becomes "static" in C, and may be #define'd to nothing in some 1861*0Sstevel@tonic-gateconfigurations in future. 1862*0Sstevel@tonic-gate 1863*0Sstevel@tonic-gateA public function (i.e. part of the internal API, but not necessarily 1864*0Sstevel@tonic-gatesanctioned for use in extensions) begins like this: 1865*0Sstevel@tonic-gate 1866*0Sstevel@tonic-gate void 1867*0Sstevel@tonic-gate Perl_sv_setiv(pTHX_ SV* dsv, IV num) 1868*0Sstevel@tonic-gate 1869*0Sstevel@tonic-gateC<pTHX_> is one of a number of macros (in perl.h) that hide the 1870*0Sstevel@tonic-gatedetails of the interpreter's context. THX stands for "thread", "this", 1871*0Sstevel@tonic-gateor "thingy", as the case may be. (And no, George Lucas is not involved. :-) 1872*0Sstevel@tonic-gateThe first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 1873*0Sstevel@tonic-gateor 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 1874*0Sstevel@tonic-gatetheir variants. 1875*0Sstevel@tonic-gate 1876*0Sstevel@tonic-gateWhen Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no 1877*0Sstevel@tonic-gatefirst argument containing the interpreter's context. The trailing underscore 1878*0Sstevel@tonic-gatein the pTHX_ macro indicates that the macro expansion needs a comma 1879*0Sstevel@tonic-gateafter the context argument because other arguments follow it. If 1880*0Sstevel@tonic-gatePERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the 1881*0Sstevel@tonic-gatesubroutine is not prototyped to take the extra argument. The form of the 1882*0Sstevel@tonic-gatemacro without the trailing underscore is used when there are no additional 1883*0Sstevel@tonic-gateexplicit arguments. 1884*0Sstevel@tonic-gate 1885*0Sstevel@tonic-gateWhen a core function calls another, it must pass the context. This 1886*0Sstevel@tonic-gateis normally hidden via macros. Consider C<sv_setiv>. It expands into 1887*0Sstevel@tonic-gatesomething like this: 1888*0Sstevel@tonic-gate 1889*0Sstevel@tonic-gate #ifdef PERL_IMPLICIT_CONTEXT 1890*0Sstevel@tonic-gate #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 1891*0Sstevel@tonic-gate /* can't do this for vararg functions, see below */ 1892*0Sstevel@tonic-gate #else 1893*0Sstevel@tonic-gate #define sv_setiv Perl_sv_setiv 1894*0Sstevel@tonic-gate #endif 1895*0Sstevel@tonic-gate 1896*0Sstevel@tonic-gateThis works well, and means that XS authors can gleefully write: 1897*0Sstevel@tonic-gate 1898*0Sstevel@tonic-gate sv_setiv(foo, bar); 1899*0Sstevel@tonic-gate 1900*0Sstevel@tonic-gateand still have it work under all the modes Perl could have been 1901*0Sstevel@tonic-gatecompiled with. 1902*0Sstevel@tonic-gate 1903*0Sstevel@tonic-gateThis doesn't work so cleanly for varargs functions, though, as macros 1904*0Sstevel@tonic-gateimply that the number of arguments is known in advance. Instead we 1905*0Sstevel@tonic-gateeither need to spell them out fully, passing C<aTHX_> as the first 1906*0Sstevel@tonic-gateargument (the Perl core tends to do this with functions like 1907*0Sstevel@tonic-gatePerl_warner), or use a context-free version. 1908*0Sstevel@tonic-gate 1909*0Sstevel@tonic-gateThe context-free version of Perl_warner is called 1910*0Sstevel@tonic-gatePerl_warner_nocontext, and does not take the extra argument. Instead 1911*0Sstevel@tonic-gateit does dTHX; to get the context from thread-local storage. We 1912*0Sstevel@tonic-gateC<#define warner Perl_warner_nocontext> so that extensions get source 1913*0Sstevel@tonic-gatecompatibility at the expense of performance. (Passing an arg is 1914*0Sstevel@tonic-gatecheaper than grabbing it from thread-local storage.) 1915*0Sstevel@tonic-gate 1916*0Sstevel@tonic-gateYou can ignore [pad]THXx when browsing the Perl headers/sources. 1917*0Sstevel@tonic-gateThose are strictly for use within the core. Extensions and embedders 1918*0Sstevel@tonic-gateneed only be aware of [pad]THX. 1919*0Sstevel@tonic-gate 1920*0Sstevel@tonic-gate=head2 So what happened to dTHR? 1921*0Sstevel@tonic-gate 1922*0Sstevel@tonic-gateC<dTHR> was introduced in perl 5.005 to support the older thread model. 1923*0Sstevel@tonic-gateThe older thread model now uses the C<THX> mechanism to pass context 1924*0Sstevel@tonic-gatepointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 1925*0Sstevel@tonic-gatelater still have it for backward source compatibility, but it is defined 1926*0Sstevel@tonic-gateto be a no-op. 1927*0Sstevel@tonic-gate 1928*0Sstevel@tonic-gate=head2 How do I use all this in extensions? 1929*0Sstevel@tonic-gate 1930*0Sstevel@tonic-gateWhen Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call 1931*0Sstevel@tonic-gateany functions in the Perl API will need to pass the initial context 1932*0Sstevel@tonic-gateargument somehow. The kicker is that you will need to write it in 1933*0Sstevel@tonic-gatesuch a way that the extension still compiles when Perl hasn't been 1934*0Sstevel@tonic-gatebuilt with PERL_IMPLICIT_CONTEXT enabled. 1935*0Sstevel@tonic-gate 1936*0Sstevel@tonic-gateThere are three ways to do this. First, the easy but inefficient way, 1937*0Sstevel@tonic-gatewhich is also the default, in order to maintain source compatibility 1938*0Sstevel@tonic-gatewith extensions: whenever XSUB.h is #included, it redefines the aTHX 1939*0Sstevel@tonic-gateand aTHX_ macros to call a function that will return the context. 1940*0Sstevel@tonic-gateThus, something like: 1941*0Sstevel@tonic-gate 1942*0Sstevel@tonic-gate sv_setiv(sv, num); 1943*0Sstevel@tonic-gate 1944*0Sstevel@tonic-gatein your extension will translate to this when PERL_IMPLICIT_CONTEXT is 1945*0Sstevel@tonic-gatein effect: 1946*0Sstevel@tonic-gate 1947*0Sstevel@tonic-gate Perl_sv_setiv(Perl_get_context(), sv, num); 1948*0Sstevel@tonic-gate 1949*0Sstevel@tonic-gateor to this otherwise: 1950*0Sstevel@tonic-gate 1951*0Sstevel@tonic-gate Perl_sv_setiv(sv, num); 1952*0Sstevel@tonic-gate 1953*0Sstevel@tonic-gateYou have to do nothing new in your extension to get this; since 1954*0Sstevel@tonic-gatethe Perl library provides Perl_get_context(), it will all just 1955*0Sstevel@tonic-gatework. 1956*0Sstevel@tonic-gate 1957*0Sstevel@tonic-gateThe second, more efficient way is to use the following template for 1958*0Sstevel@tonic-gateyour Foo.xs: 1959*0Sstevel@tonic-gate 1960*0Sstevel@tonic-gate #define PERL_NO_GET_CONTEXT /* we want efficiency */ 1961*0Sstevel@tonic-gate #include "EXTERN.h" 1962*0Sstevel@tonic-gate #include "perl.h" 1963*0Sstevel@tonic-gate #include "XSUB.h" 1964*0Sstevel@tonic-gate 1965*0Sstevel@tonic-gate static my_private_function(int arg1, int arg2); 1966*0Sstevel@tonic-gate 1967*0Sstevel@tonic-gate static SV * 1968*0Sstevel@tonic-gate my_private_function(int arg1, int arg2) 1969*0Sstevel@tonic-gate { 1970*0Sstevel@tonic-gate dTHX; /* fetch context */ 1971*0Sstevel@tonic-gate ... call many Perl API functions ... 1972*0Sstevel@tonic-gate } 1973*0Sstevel@tonic-gate 1974*0Sstevel@tonic-gate [... etc ...] 1975*0Sstevel@tonic-gate 1976*0Sstevel@tonic-gate MODULE = Foo PACKAGE = Foo 1977*0Sstevel@tonic-gate 1978*0Sstevel@tonic-gate /* typical XSUB */ 1979*0Sstevel@tonic-gate 1980*0Sstevel@tonic-gate void 1981*0Sstevel@tonic-gate my_xsub(arg) 1982*0Sstevel@tonic-gate int arg 1983*0Sstevel@tonic-gate CODE: 1984*0Sstevel@tonic-gate my_private_function(arg, 10); 1985*0Sstevel@tonic-gate 1986*0Sstevel@tonic-gateNote that the only two changes from the normal way of writing an 1987*0Sstevel@tonic-gateextension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 1988*0Sstevel@tonic-gateincluding the Perl headers, followed by a C<dTHX;> declaration at 1989*0Sstevel@tonic-gatethe start of every function that will call the Perl API. (You'll 1990*0Sstevel@tonic-gateknow which functions need this, because the C compiler will complain 1991*0Sstevel@tonic-gatethat there's an undeclared identifier in those functions.) No changes 1992*0Sstevel@tonic-gateare needed for the XSUBs themselves, because the XS() macro is 1993*0Sstevel@tonic-gatecorrectly defined to pass in the implicit context if needed. 1994*0Sstevel@tonic-gate 1995*0Sstevel@tonic-gateThe third, even more efficient way is to ape how it is done within 1996*0Sstevel@tonic-gatethe Perl guts: 1997*0Sstevel@tonic-gate 1998*0Sstevel@tonic-gate 1999*0Sstevel@tonic-gate #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2000*0Sstevel@tonic-gate #include "EXTERN.h" 2001*0Sstevel@tonic-gate #include "perl.h" 2002*0Sstevel@tonic-gate #include "XSUB.h" 2003*0Sstevel@tonic-gate 2004*0Sstevel@tonic-gate /* pTHX_ only needed for functions that call Perl API */ 2005*0Sstevel@tonic-gate static my_private_function(pTHX_ int arg1, int arg2); 2006*0Sstevel@tonic-gate 2007*0Sstevel@tonic-gate static SV * 2008*0Sstevel@tonic-gate my_private_function(pTHX_ int arg1, int arg2) 2009*0Sstevel@tonic-gate { 2010*0Sstevel@tonic-gate /* dTHX; not needed here, because THX is an argument */ 2011*0Sstevel@tonic-gate ... call Perl API functions ... 2012*0Sstevel@tonic-gate } 2013*0Sstevel@tonic-gate 2014*0Sstevel@tonic-gate [... etc ...] 2015*0Sstevel@tonic-gate 2016*0Sstevel@tonic-gate MODULE = Foo PACKAGE = Foo 2017*0Sstevel@tonic-gate 2018*0Sstevel@tonic-gate /* typical XSUB */ 2019*0Sstevel@tonic-gate 2020*0Sstevel@tonic-gate void 2021*0Sstevel@tonic-gate my_xsub(arg) 2022*0Sstevel@tonic-gate int arg 2023*0Sstevel@tonic-gate CODE: 2024*0Sstevel@tonic-gate my_private_function(aTHX_ arg, 10); 2025*0Sstevel@tonic-gate 2026*0Sstevel@tonic-gateThis implementation never has to fetch the context using a function 2027*0Sstevel@tonic-gatecall, since it is always passed as an extra argument. Depending on 2028*0Sstevel@tonic-gateyour needs for simplicity or efficiency, you may mix the previous 2029*0Sstevel@tonic-gatetwo approaches freely. 2030*0Sstevel@tonic-gate 2031*0Sstevel@tonic-gateNever add a comma after C<pTHX> yourself--always use the form of the 2032*0Sstevel@tonic-gatemacro with the underscore for functions that take explicit arguments, 2033*0Sstevel@tonic-gateor the form without the argument for functions with no explicit arguments. 2034*0Sstevel@tonic-gate 2035*0Sstevel@tonic-gate=head2 Should I do anything special if I call perl from multiple threads? 2036*0Sstevel@tonic-gate 2037*0Sstevel@tonic-gateIf you create interpreters in one thread and then proceed to call them in 2038*0Sstevel@tonic-gateanother, you need to make sure perl's own Thread Local Storage (TLS) slot is 2039*0Sstevel@tonic-gateinitialized correctly in each of those threads. 2040*0Sstevel@tonic-gate 2041*0Sstevel@tonic-gateThe C<perl_alloc> and C<perl_clone> API functions will automatically set 2042*0Sstevel@tonic-gatethe TLS slot to the interpreter they created, so that there is no need to do 2043*0Sstevel@tonic-gateanything special if the interpreter is always accessed in the same thread that 2044*0Sstevel@tonic-gatecreated it, and that thread did not create or call any other interpreters 2045*0Sstevel@tonic-gateafterwards. If that is not the case, you have to set the TLS slot of the 2046*0Sstevel@tonic-gatethread before calling any functions in the Perl API on that particular 2047*0Sstevel@tonic-gateinterpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 2048*0Sstevel@tonic-gatethread as the first thing you do: 2049*0Sstevel@tonic-gate 2050*0Sstevel@tonic-gate /* do this before doing anything else with some_perl */ 2051*0Sstevel@tonic-gate PERL_SET_CONTEXT(some_perl); 2052*0Sstevel@tonic-gate 2053*0Sstevel@tonic-gate ... other Perl API calls on some_perl go here ... 2054*0Sstevel@tonic-gate 2055*0Sstevel@tonic-gate=head2 Future Plans and PERL_IMPLICIT_SYS 2056*0Sstevel@tonic-gate 2057*0Sstevel@tonic-gateJust as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything 2058*0Sstevel@tonic-gatethat the interpreter knows about itself and pass it around, so too are 2059*0Sstevel@tonic-gatethere plans to allow the interpreter to bundle up everything it knows 2060*0Sstevel@tonic-gateabout the environment it's running on. This is enabled with the 2061*0Sstevel@tonic-gatePERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS 2062*0Sstevel@tonic-gateand USE_5005THREADS on Windows (see inside iperlsys.h). 2063*0Sstevel@tonic-gate 2064*0Sstevel@tonic-gateThis allows the ability to provide an extra pointer (called the "host" 2065*0Sstevel@tonic-gateenvironment) for all the system calls. This makes it possible for 2066*0Sstevel@tonic-gateall the system stuff to maintain their own state, broken down into 2067*0Sstevel@tonic-gateseven C structures. These are thin wrappers around the usual system 2068*0Sstevel@tonic-gatecalls (see win32/perllib.c) for the default perl executable, but for a 2069*0Sstevel@tonic-gatemore ambitious host (like the one that would do fork() emulation) all 2070*0Sstevel@tonic-gatethe extra work needed to pretend that different interpreters are 2071*0Sstevel@tonic-gateactually different "processes", would be done here. 2072*0Sstevel@tonic-gate 2073*0Sstevel@tonic-gateThe Perl engine/interpreter and the host are orthogonal entities. 2074*0Sstevel@tonic-gateThere could be one or more interpreters in a process, and one or 2075*0Sstevel@tonic-gatemore "hosts", with free association between them. 2076*0Sstevel@tonic-gate 2077*0Sstevel@tonic-gate=head1 Internal Functions 2078*0Sstevel@tonic-gate 2079*0Sstevel@tonic-gateAll of Perl's internal functions which will be exposed to the outside 2080*0Sstevel@tonic-gateworld are prefixed by C<Perl_> so that they will not conflict with XS 2081*0Sstevel@tonic-gatefunctions or functions used in a program in which Perl is embedded. 2082*0Sstevel@tonic-gateSimilarly, all global variables begin with C<PL_>. (By convention, 2083*0Sstevel@tonic-gatestatic functions start with C<S_>.) 2084*0Sstevel@tonic-gate 2085*0Sstevel@tonic-gateInside the Perl core, you can get at the functions either with or 2086*0Sstevel@tonic-gatewithout the C<Perl_> prefix, thanks to a bunch of defines that live in 2087*0Sstevel@tonic-gateF<embed.h>. This header file is generated automatically from 2088*0Sstevel@tonic-gateF<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 2089*0Sstevel@tonic-gateheader files for the internal functions, generates the documentation 2090*0Sstevel@tonic-gateand a lot of other bits and pieces. It's important that when you add 2091*0Sstevel@tonic-gatea new function to the core or change an existing one, you change the 2092*0Sstevel@tonic-gatedata in the table in F<embed.fnc> as well. Here's a sample entry from 2093*0Sstevel@tonic-gatethat table: 2094*0Sstevel@tonic-gate 2095*0Sstevel@tonic-gate Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 2096*0Sstevel@tonic-gate 2097*0Sstevel@tonic-gateThe second column is the return type, the third column the name. Columns 2098*0Sstevel@tonic-gateafter that are the arguments. The first column is a set of flags: 2099*0Sstevel@tonic-gate 2100*0Sstevel@tonic-gate=over 3 2101*0Sstevel@tonic-gate 2102*0Sstevel@tonic-gate=item A 2103*0Sstevel@tonic-gate 2104*0Sstevel@tonic-gateThis function is a part of the public API. 2105*0Sstevel@tonic-gate 2106*0Sstevel@tonic-gate=item p 2107*0Sstevel@tonic-gate 2108*0Sstevel@tonic-gateThis function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch> 2109*0Sstevel@tonic-gate 2110*0Sstevel@tonic-gate=item d 2111*0Sstevel@tonic-gate 2112*0Sstevel@tonic-gateThis function has documentation using the C<apidoc> feature which we'll 2113*0Sstevel@tonic-gatelook at in a second. 2114*0Sstevel@tonic-gate 2115*0Sstevel@tonic-gate=back 2116*0Sstevel@tonic-gate 2117*0Sstevel@tonic-gateOther available flags are: 2118*0Sstevel@tonic-gate 2119*0Sstevel@tonic-gate=over 3 2120*0Sstevel@tonic-gate 2121*0Sstevel@tonic-gate=item s 2122*0Sstevel@tonic-gate 2123*0Sstevel@tonic-gateThis is a static function and is defined as C<S_whatever>, and usually 2124*0Sstevel@tonic-gatecalled within the sources as C<whatever(...)>. 2125*0Sstevel@tonic-gate 2126*0Sstevel@tonic-gate=item n 2127*0Sstevel@tonic-gate 2128*0Sstevel@tonic-gateThis does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See 2129*0Sstevel@tonic-gateL<perlguts/Background and PERL_IMPLICIT_CONTEXT>.) 2130*0Sstevel@tonic-gate 2131*0Sstevel@tonic-gate=item r 2132*0Sstevel@tonic-gate 2133*0Sstevel@tonic-gateThis function never returns; C<croak>, C<exit> and friends. 2134*0Sstevel@tonic-gate 2135*0Sstevel@tonic-gate=item f 2136*0Sstevel@tonic-gate 2137*0Sstevel@tonic-gateThis function takes a variable number of arguments, C<printf> style. 2138*0Sstevel@tonic-gateThe argument list should end with C<...>, like this: 2139*0Sstevel@tonic-gate 2140*0Sstevel@tonic-gate Afprd |void |croak |const char* pat|... 2141*0Sstevel@tonic-gate 2142*0Sstevel@tonic-gate=item M 2143*0Sstevel@tonic-gate 2144*0Sstevel@tonic-gateThis function is part of the experimental development API, and may change 2145*0Sstevel@tonic-gateor disappear without notice. 2146*0Sstevel@tonic-gate 2147*0Sstevel@tonic-gate=item o 2148*0Sstevel@tonic-gate 2149*0Sstevel@tonic-gateThis function should not have a compatibility macro to define, say, 2150*0Sstevel@tonic-gateC<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. 2151*0Sstevel@tonic-gate 2152*0Sstevel@tonic-gate=item x 2153*0Sstevel@tonic-gate 2154*0Sstevel@tonic-gateThis function isn't exported out of the Perl core. 2155*0Sstevel@tonic-gate 2156*0Sstevel@tonic-gate=item m 2157*0Sstevel@tonic-gate 2158*0Sstevel@tonic-gateThis is implemented as a macro. 2159*0Sstevel@tonic-gate 2160*0Sstevel@tonic-gate=item X 2161*0Sstevel@tonic-gate 2162*0Sstevel@tonic-gateThis function is explicitly exported. 2163*0Sstevel@tonic-gate 2164*0Sstevel@tonic-gate=item E 2165*0Sstevel@tonic-gate 2166*0Sstevel@tonic-gateThis function is visible to extensions included in the Perl core. 2167*0Sstevel@tonic-gate 2168*0Sstevel@tonic-gate=item b 2169*0Sstevel@tonic-gate 2170*0Sstevel@tonic-gateBinary backward compatibility; this function is a macro but also has 2171*0Sstevel@tonic-gatea C<Perl_> implementation (which is exported). 2172*0Sstevel@tonic-gate 2173*0Sstevel@tonic-gate=back 2174*0Sstevel@tonic-gate 2175*0Sstevel@tonic-gateIf you edit F<embed.pl> or F<embed.fnc>, you will need to run 2176*0Sstevel@tonic-gateC<make regen_headers> to force a rebuild of F<embed.h> and other 2177*0Sstevel@tonic-gateauto-generated files. 2178*0Sstevel@tonic-gate 2179*0Sstevel@tonic-gate=head2 Formatted Printing of IVs, UVs, and NVs 2180*0Sstevel@tonic-gate 2181*0Sstevel@tonic-gateIf you are printing IVs, UVs, or NVS instead of the stdio(3) style 2182*0Sstevel@tonic-gateformatting codes like C<%d>, C<%ld>, C<%f>, you should use the 2183*0Sstevel@tonic-gatefollowing macros for portability 2184*0Sstevel@tonic-gate 2185*0Sstevel@tonic-gate IVdf IV in decimal 2186*0Sstevel@tonic-gate UVuf UV in decimal 2187*0Sstevel@tonic-gate UVof UV in octal 2188*0Sstevel@tonic-gate UVxf UV in hexadecimal 2189*0Sstevel@tonic-gate NVef NV %e-like 2190*0Sstevel@tonic-gate NVff NV %f-like 2191*0Sstevel@tonic-gate NVgf NV %g-like 2192*0Sstevel@tonic-gate 2193*0Sstevel@tonic-gateThese will take care of 64-bit integers and long doubles. 2194*0Sstevel@tonic-gateFor example: 2195*0Sstevel@tonic-gate 2196*0Sstevel@tonic-gate printf("IV is %"IVdf"\n", iv); 2197*0Sstevel@tonic-gate 2198*0Sstevel@tonic-gateThe IVdf will expand to whatever is the correct format for the IVs. 2199*0Sstevel@tonic-gate 2200*0Sstevel@tonic-gateIf you are printing addresses of pointers, use UVxf combined 2201*0Sstevel@tonic-gatewith PTR2UV(), do not use %lx or %p. 2202*0Sstevel@tonic-gate 2203*0Sstevel@tonic-gate=head2 Pointer-To-Integer and Integer-To-Pointer 2204*0Sstevel@tonic-gate 2205*0Sstevel@tonic-gateBecause pointer size does not necessarily equal integer size, 2206*0Sstevel@tonic-gateuse the follow macros to do it right. 2207*0Sstevel@tonic-gate 2208*0Sstevel@tonic-gate PTR2UV(pointer) 2209*0Sstevel@tonic-gate PTR2IV(pointer) 2210*0Sstevel@tonic-gate PTR2NV(pointer) 2211*0Sstevel@tonic-gate INT2PTR(pointertotype, integer) 2212*0Sstevel@tonic-gate 2213*0Sstevel@tonic-gateFor example: 2214*0Sstevel@tonic-gate 2215*0Sstevel@tonic-gate IV iv = ...; 2216*0Sstevel@tonic-gate SV *sv = INT2PTR(SV*, iv); 2217*0Sstevel@tonic-gate 2218*0Sstevel@tonic-gateand 2219*0Sstevel@tonic-gate 2220*0Sstevel@tonic-gate AV *av = ...; 2221*0Sstevel@tonic-gate UV uv = PTR2UV(av); 2222*0Sstevel@tonic-gate 2223*0Sstevel@tonic-gate=head2 Source Documentation 2224*0Sstevel@tonic-gate 2225*0Sstevel@tonic-gateThere's an effort going on to document the internal functions and 2226*0Sstevel@tonic-gateautomatically produce reference manuals from them - L<perlapi> is one 2227*0Sstevel@tonic-gatesuch manual which details all the functions which are available to XS 2228*0Sstevel@tonic-gatewriters. L<perlintern> is the autogenerated manual for the functions 2229*0Sstevel@tonic-gatewhich are not part of the API and are supposedly for internal use only. 2230*0Sstevel@tonic-gate 2231*0Sstevel@tonic-gateSource documentation is created by putting POD comments into the C 2232*0Sstevel@tonic-gatesource, like this: 2233*0Sstevel@tonic-gate 2234*0Sstevel@tonic-gate /* 2235*0Sstevel@tonic-gate =for apidoc sv_setiv 2236*0Sstevel@tonic-gate 2237*0Sstevel@tonic-gate Copies an integer into the given SV. Does not handle 'set' magic. See 2238*0Sstevel@tonic-gate C<sv_setiv_mg>. 2239*0Sstevel@tonic-gate 2240*0Sstevel@tonic-gate =cut 2241*0Sstevel@tonic-gate */ 2242*0Sstevel@tonic-gate 2243*0Sstevel@tonic-gatePlease try and supply some documentation if you add functions to the 2244*0Sstevel@tonic-gatePerl core. 2245*0Sstevel@tonic-gate 2246*0Sstevel@tonic-gate=head1 Unicode Support 2247*0Sstevel@tonic-gate 2248*0Sstevel@tonic-gatePerl 5.6.0 introduced Unicode support. It's important for porters and XS 2249*0Sstevel@tonic-gatewriters to understand this support and make sure that the code they 2250*0Sstevel@tonic-gatewrite does not corrupt Unicode data. 2251*0Sstevel@tonic-gate 2252*0Sstevel@tonic-gate=head2 What B<is> Unicode, anyway? 2253*0Sstevel@tonic-gate 2254*0Sstevel@tonic-gateIn the olden, less enlightened times, we all used to use ASCII. Most of 2255*0Sstevel@tonic-gateus did, anyway. The big problem with ASCII is that it's American. Well, 2256*0Sstevel@tonic-gateno, that's not actually the problem; the problem is that it's not 2257*0Sstevel@tonic-gateparticularly useful for people who don't use the Roman alphabet. What 2258*0Sstevel@tonic-gateused to happen was that particular languages would stick their own 2259*0Sstevel@tonic-gatealphabet in the upper range of the sequence, between 128 and 255. Of 2260*0Sstevel@tonic-gatecourse, we then ended up with plenty of variants that weren't quite 2261*0Sstevel@tonic-gateASCII, and the whole point of it being a standard was lost. 2262*0Sstevel@tonic-gate 2263*0Sstevel@tonic-gateWorse still, if you've got a language like Chinese or 2264*0Sstevel@tonic-gateJapanese that has hundreds or thousands of characters, then you really 2265*0Sstevel@tonic-gatecan't fit them into a mere 256, so they had to forget about ASCII 2266*0Sstevel@tonic-gatealtogether, and build their own systems using pairs of numbers to refer 2267*0Sstevel@tonic-gateto one character. 2268*0Sstevel@tonic-gate 2269*0Sstevel@tonic-gateTo fix this, some people formed Unicode, Inc. and 2270*0Sstevel@tonic-gateproduced a new character set containing all the characters you can 2271*0Sstevel@tonic-gatepossibly think of and more. There are several ways of representing these 2272*0Sstevel@tonic-gatecharacters, and the one Perl uses is called UTF-8. UTF-8 uses 2273*0Sstevel@tonic-gatea variable number of bytes to represent a character, instead of just 2274*0Sstevel@tonic-gateone. You can learn more about Unicode at http://www.unicode.org/ 2275*0Sstevel@tonic-gate 2276*0Sstevel@tonic-gate=head2 How can I recognise a UTF-8 string? 2277*0Sstevel@tonic-gate 2278*0Sstevel@tonic-gateYou can't. This is because UTF-8 data is stored in bytes just like 2279*0Sstevel@tonic-gatenon-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 2280*0Sstevel@tonic-gatecapital E with a grave accent, is represented by the two bytes 2281*0Sstevel@tonic-gateC<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 2282*0Sstevel@tonic-gatehas that byte sequence as well. So you can't tell just by looking - this 2283*0Sstevel@tonic-gateis what makes Unicode input an interesting problem. 2284*0Sstevel@tonic-gate 2285*0Sstevel@tonic-gateThe API function C<is_utf8_string> can help; it'll tell you if a string 2286*0Sstevel@tonic-gatecontains only valid UTF-8 characters. However, it can't do the work for 2287*0Sstevel@tonic-gateyou. On a character-by-character basis, C<is_utf8_char> will tell you 2288*0Sstevel@tonic-gatewhether the current character in a string is valid UTF-8. 2289*0Sstevel@tonic-gate 2290*0Sstevel@tonic-gate=head2 How does UTF-8 represent Unicode characters? 2291*0Sstevel@tonic-gate 2292*0Sstevel@tonic-gateAs mentioned above, UTF-8 uses a variable number of bytes to store a 2293*0Sstevel@tonic-gatecharacter. Characters with values 1...128 are stored in one byte, just 2294*0Sstevel@tonic-gatelike good ol' ASCII. Character 129 is stored as C<v194.129>; this 2295*0Sstevel@tonic-gatecontinues up to character 191, which is C<v194.191>. Now we've run out of 2296*0Sstevel@tonic-gatebits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And 2297*0Sstevel@tonic-gateso it goes on, moving to three bytes at character 2048. 2298*0Sstevel@tonic-gate 2299*0Sstevel@tonic-gateAssuming you know you're dealing with a UTF-8 string, you can find out 2300*0Sstevel@tonic-gatehow long the first character in it is with the C<UTF8SKIP> macro: 2301*0Sstevel@tonic-gate 2302*0Sstevel@tonic-gate char *utf = "\305\233\340\240\201"; 2303*0Sstevel@tonic-gate I32 len; 2304*0Sstevel@tonic-gate 2305*0Sstevel@tonic-gate len = UTF8SKIP(utf); /* len is 2 here */ 2306*0Sstevel@tonic-gate utf += len; 2307*0Sstevel@tonic-gate len = UTF8SKIP(utf); /* len is 3 here */ 2308*0Sstevel@tonic-gate 2309*0Sstevel@tonic-gateAnother way to skip over characters in a UTF-8 string is to use 2310*0Sstevel@tonic-gateC<utf8_hop>, which takes a string and a number of characters to skip 2311*0Sstevel@tonic-gateover. You're on your own about bounds checking, though, so don't use it 2312*0Sstevel@tonic-gatelightly. 2313*0Sstevel@tonic-gate 2314*0Sstevel@tonic-gateAll bytes in a multi-byte UTF-8 character will have the high bit set, 2315*0Sstevel@tonic-gateso you can test if you need to do something special with this 2316*0Sstevel@tonic-gatecharacter like this (the UTF8_IS_INVARIANT() is a macro that tests 2317*0Sstevel@tonic-gatewhether the byte can be encoded as a single byte even in UTF-8): 2318*0Sstevel@tonic-gate 2319*0Sstevel@tonic-gate U8 *utf; 2320*0Sstevel@tonic-gate UV uv; /* Note: a UV, not a U8, not a char */ 2321*0Sstevel@tonic-gate 2322*0Sstevel@tonic-gate if (!UTF8_IS_INVARIANT(*utf)) 2323*0Sstevel@tonic-gate /* Must treat this as UTF-8 */ 2324*0Sstevel@tonic-gate uv = utf8_to_uv(utf); 2325*0Sstevel@tonic-gate else 2326*0Sstevel@tonic-gate /* OK to treat this character as a byte */ 2327*0Sstevel@tonic-gate uv = *utf; 2328*0Sstevel@tonic-gate 2329*0Sstevel@tonic-gateYou can also see in that example that we use C<utf8_to_uv> to get the 2330*0Sstevel@tonic-gatevalue of the character; the inverse function C<uv_to_utf8> is available 2331*0Sstevel@tonic-gatefor putting a UV into UTF-8: 2332*0Sstevel@tonic-gate 2333*0Sstevel@tonic-gate if (!UTF8_IS_INVARIANT(uv)) 2334*0Sstevel@tonic-gate /* Must treat this as UTF8 */ 2335*0Sstevel@tonic-gate utf8 = uv_to_utf8(utf8, uv); 2336*0Sstevel@tonic-gate else 2337*0Sstevel@tonic-gate /* OK to treat this character as a byte */ 2338*0Sstevel@tonic-gate *utf8++ = uv; 2339*0Sstevel@tonic-gate 2340*0Sstevel@tonic-gateYou B<must> convert characters to UVs using the above functions if 2341*0Sstevel@tonic-gateyou're ever in a situation where you have to match UTF-8 and non-UTF-8 2342*0Sstevel@tonic-gatecharacters. You may not skip over UTF-8 characters in this case. If you 2343*0Sstevel@tonic-gatedo this, you'll lose the ability to match hi-bit non-UTF-8 characters; 2344*0Sstevel@tonic-gatefor instance, if your UTF-8 string contains C<v196.172>, and you skip 2345*0Sstevel@tonic-gatethat character, you can never match a C<chr(200)> in a non-UTF-8 string. 2346*0Sstevel@tonic-gateSo don't do that! 2347*0Sstevel@tonic-gate 2348*0Sstevel@tonic-gate=head2 How does Perl store UTF-8 strings? 2349*0Sstevel@tonic-gate 2350*0Sstevel@tonic-gateCurrently, Perl deals with Unicode strings and non-Unicode strings 2351*0Sstevel@tonic-gateslightly differently. If a string has been identified as being UTF-8 2352*0Sstevel@tonic-gateencoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and 2353*0Sstevel@tonic-gatemanipulate this flag with the following macros: 2354*0Sstevel@tonic-gate 2355*0Sstevel@tonic-gate SvUTF8(sv) 2356*0Sstevel@tonic-gate SvUTF8_on(sv) 2357*0Sstevel@tonic-gate SvUTF8_off(sv) 2358*0Sstevel@tonic-gate 2359*0Sstevel@tonic-gateThis flag has an important effect on Perl's treatment of the string: if 2360*0Sstevel@tonic-gateUnicode data is not properly distinguished, regular expressions, 2361*0Sstevel@tonic-gateC<length>, C<substr> and other string handling operations will have 2362*0Sstevel@tonic-gateundesirable results. 2363*0Sstevel@tonic-gate 2364*0Sstevel@tonic-gateThe problem comes when you have, for instance, a string that isn't 2365*0Sstevel@tonic-gateflagged is UTF-8, and contains a byte sequence that could be UTF-8 - 2366*0Sstevel@tonic-gateespecially when combining non-UTF-8 and UTF-8 strings. 2367*0Sstevel@tonic-gate 2368*0Sstevel@tonic-gateNever forget that the C<SVf_UTF8> flag is separate to the PV value; you 2369*0Sstevel@tonic-gateneed be sure you don't accidentally knock it off while you're 2370*0Sstevel@tonic-gatemanipulating SVs. More specifically, you cannot expect to do this: 2371*0Sstevel@tonic-gate 2372*0Sstevel@tonic-gate SV *sv; 2373*0Sstevel@tonic-gate SV *nsv; 2374*0Sstevel@tonic-gate STRLEN len; 2375*0Sstevel@tonic-gate char *p; 2376*0Sstevel@tonic-gate 2377*0Sstevel@tonic-gate p = SvPV(sv, len); 2378*0Sstevel@tonic-gate frobnicate(p); 2379*0Sstevel@tonic-gate nsv = newSVpvn(p, len); 2380*0Sstevel@tonic-gate 2381*0Sstevel@tonic-gateThe C<char*> string does not tell you the whole story, and you can't 2382*0Sstevel@tonic-gatecopy or reconstruct an SV just by copying the string value. Check if the 2383*0Sstevel@tonic-gateold SV has the UTF-8 flag set, and act accordingly: 2384*0Sstevel@tonic-gate 2385*0Sstevel@tonic-gate p = SvPV(sv, len); 2386*0Sstevel@tonic-gate frobnicate(p); 2387*0Sstevel@tonic-gate nsv = newSVpvn(p, len); 2388*0Sstevel@tonic-gate if (SvUTF8(sv)) 2389*0Sstevel@tonic-gate SvUTF8_on(nsv); 2390*0Sstevel@tonic-gate 2391*0Sstevel@tonic-gateIn fact, your C<frobnicate> function should be made aware of whether or 2392*0Sstevel@tonic-gatenot it's dealing with UTF-8 data, so that it can handle the string 2393*0Sstevel@tonic-gateappropriately. 2394*0Sstevel@tonic-gate 2395*0Sstevel@tonic-gateSince just passing an SV to an XS function and copying the data of 2396*0Sstevel@tonic-gatethe SV is not enough to copy the UTF-8 flags, even less right is just 2397*0Sstevel@tonic-gatepassing a C<char *> to an XS function. 2398*0Sstevel@tonic-gate 2399*0Sstevel@tonic-gate=head2 How do I convert a string to UTF-8? 2400*0Sstevel@tonic-gate 2401*0Sstevel@tonic-gateIf you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary 2402*0Sstevel@tonic-gateto upgrade one of the strings to UTF-8. If you've got an SV, the easiest 2403*0Sstevel@tonic-gateway to do this is: 2404*0Sstevel@tonic-gate 2405*0Sstevel@tonic-gate sv_utf8_upgrade(sv); 2406*0Sstevel@tonic-gate 2407*0Sstevel@tonic-gateHowever, you must not do this, for example: 2408*0Sstevel@tonic-gate 2409*0Sstevel@tonic-gate if (!SvUTF8(left)) 2410*0Sstevel@tonic-gate sv_utf8_upgrade(left); 2411*0Sstevel@tonic-gate 2412*0Sstevel@tonic-gateIf you do this in a binary operator, you will actually change one of the 2413*0Sstevel@tonic-gatestrings that came into the operator, and, while it shouldn't be noticeable 2414*0Sstevel@tonic-gateby the end user, it can cause problems. 2415*0Sstevel@tonic-gate 2416*0Sstevel@tonic-gateInstead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 2417*0Sstevel@tonic-gatestring argument. This is useful for having the data available for 2418*0Sstevel@tonic-gatecomparisons and so on, without harming the original SV. There's also 2419*0Sstevel@tonic-gateC<utf8_to_bytes> to go the other way, but naturally, this will fail if 2420*0Sstevel@tonic-gatethe string contains any characters above 255 that can't be represented 2421*0Sstevel@tonic-gatein a single byte. 2422*0Sstevel@tonic-gate 2423*0Sstevel@tonic-gate=head2 Is there anything else I need to know? 2424*0Sstevel@tonic-gate 2425*0Sstevel@tonic-gateNot really. Just remember these things: 2426*0Sstevel@tonic-gate 2427*0Sstevel@tonic-gate=over 3 2428*0Sstevel@tonic-gate 2429*0Sstevel@tonic-gate=item * 2430*0Sstevel@tonic-gate 2431*0Sstevel@tonic-gateThere's no way to tell if a string is UTF-8 or not. You can tell if an SV 2432*0Sstevel@tonic-gateis UTF-8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if 2433*0Sstevel@tonic-gatesomething should be UTF-8. Treat the flag as part of the PV, even though 2434*0Sstevel@tonic-gateit's not - if you pass on the PV to somewhere, pass on the flag too. 2435*0Sstevel@tonic-gate 2436*0Sstevel@tonic-gate=item * 2437*0Sstevel@tonic-gate 2438*0Sstevel@tonic-gateIf a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value, 2439*0Sstevel@tonic-gateunless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 2440*0Sstevel@tonic-gate 2441*0Sstevel@tonic-gate=item * 2442*0Sstevel@tonic-gate 2443*0Sstevel@tonic-gateWhen writing a character C<uv> to a UTF-8 string, B<always> use 2444*0Sstevel@tonic-gateC<uv_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case 2445*0Sstevel@tonic-gateyou can use C<*s = uv>. 2446*0Sstevel@tonic-gate 2447*0Sstevel@tonic-gate=item * 2448*0Sstevel@tonic-gate 2449*0Sstevel@tonic-gateMixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get 2450*0Sstevel@tonic-gatea new string which is UTF-8 encoded. There are tricks you can use to 2451*0Sstevel@tonic-gatedelay deciding whether you need to use a UTF-8 string until you get to a 2452*0Sstevel@tonic-gatehigh character - C<HALF_UPGRADE> is one of those. 2453*0Sstevel@tonic-gate 2454*0Sstevel@tonic-gate=back 2455*0Sstevel@tonic-gate 2456*0Sstevel@tonic-gate=head1 Custom Operators 2457*0Sstevel@tonic-gate 2458*0Sstevel@tonic-gateCustom operator support is a new experimental feature that allows you to 2459*0Sstevel@tonic-gatedefine your own ops. This is primarily to allow the building of 2460*0Sstevel@tonic-gateinterpreters for other languages in the Perl core, but it also allows 2461*0Sstevel@tonic-gateoptimizations through the creation of "macro-ops" (ops which perform the 2462*0Sstevel@tonic-gatefunctions of multiple ops which are usually executed together, such as 2463*0Sstevel@tonic-gateC<gvsv, gvsv, add>.) 2464*0Sstevel@tonic-gate 2465*0Sstevel@tonic-gateThis feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 2466*0Sstevel@tonic-gatecore does not "know" anything special about this op type, and so it will 2467*0Sstevel@tonic-gatenot be involved in any optimizations. This also means that you can 2468*0Sstevel@tonic-gatedefine your custom ops to be any op structure - unary, binary, list and 2469*0Sstevel@tonic-gateso on - you like. 2470*0Sstevel@tonic-gate 2471*0Sstevel@tonic-gateIt's important to know what custom operators won't do for you. They 2472*0Sstevel@tonic-gatewon't let you add new syntax to Perl, directly. They won't even let you 2473*0Sstevel@tonic-gateadd new keywords, directly. In fact, they won't change the way Perl 2474*0Sstevel@tonic-gatecompiles a program at all. You have to do those changes yourself, after 2475*0Sstevel@tonic-gatePerl has compiled the program. You do this either by manipulating the op 2476*0Sstevel@tonic-gatetree using a C<CHECK> block and the C<B::Generate> module, or by adding 2477*0Sstevel@tonic-gatea custom peephole optimizer with the C<optimize> module. 2478*0Sstevel@tonic-gate 2479*0Sstevel@tonic-gateWhen you do this, you replace ordinary Perl ops with custom ops by 2480*0Sstevel@tonic-gatecreating ops with the type C<OP_CUSTOM> and the C<pp_addr> of your own 2481*0Sstevel@tonic-gatePP function. This should be defined in XS code, and should look like 2482*0Sstevel@tonic-gatethe PP ops in C<pp_*.c>. You are responsible for ensuring that your op 2483*0Sstevel@tonic-gatetakes the appropriate number of values from the stack, and you are 2484*0Sstevel@tonic-gateresponsible for adding stack marks if necessary. 2485*0Sstevel@tonic-gate 2486*0Sstevel@tonic-gateYou should also "register" your op with the Perl interpreter so that it 2487*0Sstevel@tonic-gatecan produce sensible error and warning messages. Since it is possible to 2488*0Sstevel@tonic-gatehave multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 2489*0Sstevel@tonic-gatePerl uses the value of C<< o->op_ppaddr >> as a key into the 2490*0Sstevel@tonic-gateC<PL_custom_op_descs> and C<PL_custom_op_names> hashes. This means you 2491*0Sstevel@tonic-gateneed to enter a name and description for your op at the appropriate 2492*0Sstevel@tonic-gateplace in the C<PL_custom_op_names> and C<PL_custom_op_descs> hashes. 2493*0Sstevel@tonic-gate 2494*0Sstevel@tonic-gateForthcoming versions of C<B::Generate> (version 1.0 and above) should 2495*0Sstevel@tonic-gatedirectly support the creation of custom ops by name; C<Opcodes::Custom> 2496*0Sstevel@tonic-gatewill provide functions which make it trivial to "register" custom ops to 2497*0Sstevel@tonic-gatethe Perl interpreter. 2498*0Sstevel@tonic-gate 2499*0Sstevel@tonic-gate=head1 AUTHORS 2500*0Sstevel@tonic-gate 2501*0Sstevel@tonic-gateUntil May 1997, this document was maintained by Jeff Okamoto 2502*0Sstevel@tonic-gateE<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 2503*0Sstevel@tonic-gateitself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 2504*0Sstevel@tonic-gate 2505*0Sstevel@tonic-gateWith lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 2506*0Sstevel@tonic-gateAndreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 2507*0Sstevel@tonic-gateBowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 2508*0Sstevel@tonic-gateStephen McCamant, and Gurusamy Sarathy. 2509*0Sstevel@tonic-gate 2510*0Sstevel@tonic-gate=head1 SEE ALSO 2511*0Sstevel@tonic-gate 2512*0Sstevel@tonic-gateperlapi(1), perlintern(1), perlxs(1), perlembed(1) 2513