1=head1 NAME 2 3perlguts - Introduction to the Perl API 4 5=head1 DESCRIPTION 6 7This document attempts to describe how to use the Perl API, as well as 8containing some info on the basic workings of the Perl core. It is far 9from complete and probably contains many errors. Please refer any 10questions or comments to the author below. 11 12=head1 Variables 13 14=head2 Datatypes 15 16Perl has three typedefs that handle Perl's three main data types: 17 18 SV Scalar Value 19 AV Array Value 20 HV Hash Value 21 22Each typedef has specific routines that manipulate the various data types. 23 24=head2 What is an "IV"? 25 26Perl uses a special typedef IV which is a simple signed integer type that is 27guaranteed to be large enough to hold a pointer (as well as an integer). 28Additionally, there is the UV, which is simply an unsigned IV. 29 30Perl also uses two special typedefs, I32 and I16, which will always be at 31least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, 32as well.) They will usually be exactly 32 and 16 bits long, but on Crays 33they will both be 64 bits. 34 35=head2 Working with SVs 36 37An SV can be created and loaded with one command. There are five types of 38values that can be loaded: an integer value (IV), an unsigned integer 39value (UV), a double (NV), a string (PV), and another scalar (SV). 40 41The seven routines are: 42 43 SV* newSViv(IV); 44 SV* newSVuv(UV); 45 SV* newSVnv(double); 46 SV* newSVpv(const char*, int); 47 SV* newSVpvn(const char*, int); 48 SV* newSVpvf(const char*, ...); 49 SV* newSVsv(SV*); 50 51If you require more complex initialisation you can create an empty SV with 52newSV(len). If C<len> is 0 an empty SV of type NULL is returned, else an 53SV of type PV is returned with len + 1 (for the NUL) bytes of storage 54allocated, accessible via SvPVX. In both cases the SV has value undef. 55 56 SV* newSV(0); /* no storage allocated */ 57 SV* newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */ 58 59To change the value of an *already-existing* SV, there are eight routines: 60 61 void sv_setiv(SV*, IV); 62 void sv_setuv(SV*, UV); 63 void sv_setnv(SV*, double); 64 void sv_setpv(SV*, const char*); 65 void sv_setpvn(SV*, const char*, int) 66 void sv_setpvf(SV*, const char*, ...); 67 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *); 68 void sv_setsv(SV*, SV*); 69 70Notice that you can choose to specify the length of the string to be 71assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 72allow Perl to calculate the length by using C<sv_setpv> or by specifying 730 as the second argument to C<newSVpv>. Be warned, though, that Perl will 74determine the string's length by using C<strlen>, which depends on the 75string terminating with a NUL character. 76 77The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 78formatted output becomes the value. 79 80C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 81either a pointer to a variable argument list or the address and length of 82an array of SVs. The last argument points to a boolean; on return, if that 83boolean is true, then locale-specific information has been used to format 84the string, and the string's contents are therefore untrustworthy (see 85L<perlsec>). This pointer may be NULL if that information is not 86important. Note that this function requires you to specify the length of 87the format. 88 89STRLEN is an integer type (Size_t, usually defined as size_t in 90config.h) guaranteed to be large enough to represent the size of 91any string that perl can handle. 92 93The C<sv_set*()> functions are not generic enough to operate on values 94that have "magic". See L<Magic Virtual Tables> later in this document. 95 96All SVs that contain strings should be terminated with a NUL character. 97If it is not NUL-terminated there is a risk of 98core dumps and corruptions from code which passes the string to C 99functions or system calls which expect a NUL-terminated string. 100Perl's own functions typically add a trailing NUL for this reason. 101Nevertheless, you should be very careful when you pass a string stored 102in an SV to a C function or system call. 103 104To access the actual value that an SV points to, you can use the macros: 105 106 SvIV(SV*) 107 SvUV(SV*) 108 SvNV(SV*) 109 SvPV(SV*, STRLEN len) 110 SvPV_nolen(SV*) 111 112which will automatically coerce the actual scalar type into an IV, UV, double, 113or string. 114 115In the C<SvPV> macro, the length of the string returned is placed into the 116variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do 117not care what the length of the data is, use the C<SvPV_nolen> macro. 118Historically the C<SvPV> macro with the global variable C<PL_na> has been 119used in this case. But that can be quite inefficient because C<PL_na> must 120be accessed in thread-local storage in threaded Perl. In any case, remember 121that Perl allows arbitrary strings of data that may both contain NULs and 122might not be terminated by a NUL. 123 124Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), 125len);>. It might work with your compiler, but it won't work for everyone. 126Break this sort of statement up into separate assignments: 127 128 SV *s; 129 STRLEN len; 130 char * ptr; 131 ptr = SvPV(s, len); 132 foo(ptr, len); 133 134If you want to know if the scalar value is TRUE, you can use: 135 136 SvTRUE(SV*) 137 138Although Perl will automatically grow strings for you, if you need to force 139Perl to allocate more memory for your SV, you can use the macro 140 141 SvGROW(SV*, STRLEN newlen) 142 143which will determine if more memory needs to be allocated. If so, it will 144call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 145decrease, the allocated memory of an SV and that it does not automatically 146add a byte for the a trailing NUL (perl's own string functions typically do 147C<SvGROW(sv, len + 1)>). 148 149If you have an SV and want to know what kind of data Perl thinks is stored 150in it, you can use the following macros to check the type of SV you have. 151 152 SvIOK(SV*) 153 SvNOK(SV*) 154 SvPOK(SV*) 155 156You can get and set the current length of the string stored in an SV with 157the following macros: 158 159 SvCUR(SV*) 160 SvCUR_set(SV*, I32 val) 161 162You can also get a pointer to the end of the string stored in the SV 163with the macro: 164 165 SvEND(SV*) 166 167But note that these last three macros are valid only if C<SvPOK()> is true. 168 169If you want to append something to the end of string stored in an C<SV*>, 170you can use the following functions: 171 172 void sv_catpv(SV*, const char*); 173 void sv_catpvn(SV*, const char*, STRLEN); 174 void sv_catpvf(SV*, const char*, ...); 175 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); 176 void sv_catsv(SV*, SV*); 177 178The first function calculates the length of the string to be appended by 179using C<strlen>. In the second, you specify the length of the string 180yourself. The third function processes its arguments like C<sprintf> and 181appends the formatted output. The fourth function works like C<vsprintf>. 182You can specify the address and length of an array of SVs instead of the 183va_list argument. The fifth function extends the string stored in the first 184SV with the string stored in the second SV. It also forces the second SV 185to be interpreted as a string. 186 187The C<sv_cat*()> functions are not generic enough to operate on values that 188have "magic". See L<Magic Virtual Tables> later in this document. 189 190If you know the name of a scalar variable, you can get a pointer to its SV 191by using the following: 192 193 SV* get_sv("package::varname", FALSE); 194 195This returns NULL if the variable does not exist. 196 197If you want to know if this variable (or any other SV) is actually C<defined>, 198you can call: 199 200 SvOK(SV*) 201 202The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. Its 203address can be used whenever an C<SV*> is needed. 204 205There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean 206TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their addresses can 207be used whenever an C<SV*> is needed. 208 209Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 210Take this code: 211 212 SV* sv = (SV*) 0; 213 if (I-am-to-return-a-real-value) { 214 sv = sv_2mortal(newSViv(42)); 215 } 216 sv_setsv(ST(0), sv); 217 218This code tries to return a new SV (which contains the value 42) if it should 219return a real value, or undef otherwise. Instead it has returned a NULL 220pointer which, somewhere down the line, will cause a segmentation violation, 221bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the first 222line and all will be well. 223 224To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 225call is not necessary (see L<Reference Counts and Mortality>). 226 227=head2 Offsets 228 229Perl provides the function C<sv_chop> to efficiently remove characters 230from the beginning of a string; you give it an SV and a pointer to 231somewhere inside the PV, and it discards everything before the 232pointer. The efficiency comes by means of a little hack: instead of 233actually removing the characters, C<sv_chop> sets the flag C<OOK> 234(offset OK) to signal to other functions that the offset hack is in 235effect, and it puts the number of bytes chopped off into the IV field 236of the SV. It then moves the PV pointer (called C<SvPVX>) forward that 237many bytes, and adjusts C<SvCUR> and C<SvLEN>. 238 239Hence, at this point, the start of the buffer that we allocated lives 240at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 241into the middle of this allocated storage. 242 243This is best demonstrated by example: 244 245 % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' 246 SV = PVIV(0x8128450) at 0x81340f0 247 REFCNT = 1 248 FLAGS = (POK,OOK,pPOK) 249 IV = 1 (OFFSET) 250 PV = 0x8135781 ( "1" . ) "2345"\0 251 CUR = 4 252 LEN = 5 253 254Here the number of bytes chopped off (1) is put into IV, and 255C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The 256portion of the string between the "real" and the "fake" beginnings is 257shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 258the fake beginning, not the real one. 259 260Something similar to the offset hack is performed on AVs to enable 261efficient shifting and splicing off the beginning of the array; while 262C<AvARRAY> points to the first element in the array that is visible from 263Perl, C<AvALLOC> points to the real start of the C array. These are 264usually the same, but a C<shift> operation can be carried out by 265increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>. 266Again, the location of the real start of the C array only comes into 267play when freeing the array. See C<av_shift> in F<av.c>. 268 269=head2 What's Really Stored in an SV? 270 271Recall that the usual method of determining the type of scalar you have is 272to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 273usually these macros will always return TRUE and calling the C<Sv*V> 274macros will do the appropriate conversion of string to integer/double or 275integer/double to string. 276 277If you I<really> need to know if you have an integer, double, or string 278pointer in an SV, you can use the following three macros instead: 279 280 SvIOKp(SV*) 281 SvNOKp(SV*) 282 SvPOKp(SV*) 283 284These will tell you if you truly have an integer, double, or string pointer 285stored in your SV. The "p" stands for private. 286 287The are various ways in which the private and public flags may differ. 288For example, a tied SV may have a valid underlying value in the IV slot 289(so SvIOKp is true), but the data should be accessed via the FETCH 290routine rather than directly, so SvIOK is false. Another is when 291numeric conversion has occured and precision has been lost: only the 292private flag is set on 'lossy' values. So when an NV is converted to an 293IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 294 295In general, though, it's best to use the C<Sv*V> macros. 296 297=head2 Working with AVs 298 299There are two ways to create and load an AV. The first method creates an 300empty AV: 301 302 AV* newAV(); 303 304The second method both creates the AV and initially populates it with SVs: 305 306 AV* av_make(I32 num, SV **ptr); 307 308The second argument points to an array containing C<num> C<SV*>'s. Once the 309AV has been created, the SVs can be destroyed, if so desired. 310 311Once the AV has been created, the following operations are possible on AVs: 312 313 void av_push(AV*, SV*); 314 SV* av_pop(AV*); 315 SV* av_shift(AV*); 316 void av_unshift(AV*, I32 num); 317 318These should be familiar operations, with the exception of C<av_unshift>. 319This routine adds C<num> elements at the front of the array with the C<undef> 320value. You must then use C<av_store> (described below) to assign values 321to these new elements. 322 323Here are some other functions: 324 325 I32 av_len(AV*); 326 SV** av_fetch(AV*, I32 key, I32 lval); 327 SV** av_store(AV*, I32 key, SV* val); 328 329The C<av_len> function returns the highest index value in array (just 330like $#array in Perl). If the array is empty, -1 is returned. The 331C<av_fetch> function returns the value at index C<key>, but if C<lval> 332is non-zero, then C<av_fetch> will store an undef value at that index. 333The C<av_store> function stores the value C<val> at index C<key>, and does 334not increment the reference count of C<val>. Thus the caller is responsible 335for taking care of that, and if C<av_store> returns NULL, the caller will 336have to decrement the reference count to avoid a memory leak. Note that 337C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 338return value. 339 340 void av_clear(AV*); 341 void av_undef(AV*); 342 void av_extend(AV*, I32 key); 343 344The C<av_clear> function deletes all the elements in the AV* array, but 345does not actually delete the array itself. The C<av_undef> function will 346delete all the elements in the array plus the array itself. The 347C<av_extend> function extends the array so that it contains at least C<key+1> 348elements. If C<key+1> is less than the currently allocated length of the array, 349then nothing is done. 350 351If you know the name of an array variable, you can get a pointer to its AV 352by using the following: 353 354 AV* get_av("package::varname", FALSE); 355 356This returns NULL if the variable does not exist. 357 358See L<Understanding the Magic of Tied Hashes and Arrays> for more 359information on how to use the array access functions on tied arrays. 360 361=head2 Working with HVs 362 363To create an HV, you use the following routine: 364 365 HV* newHV(); 366 367Once the HV has been created, the following operations are possible on HVs: 368 369 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 370 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 371 372The C<klen> parameter is the length of the key being passed in (Note that 373you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 374length of the key). The C<val> argument contains the SV pointer to the 375scalar being stored, and C<hash> is the precomputed hash value (zero if 376you want C<hv_store> to calculate it for you). The C<lval> parameter 377indicates whether this fetch is actually a part of a store operation, in 378which case a new undefined value will be added to the HV with the supplied 379key and C<hv_fetch> will return as if the value had already existed. 380 381Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 382C<SV*>. To access the scalar value, you must first dereference the return 383value. However, you should check to make sure that the return value is 384not NULL before dereferencing it. 385 386These two functions check if a hash table entry exists, and deletes it. 387 388 bool hv_exists(HV*, const char* key, U32 klen); 389 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 390 391If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 392create and return a mortal copy of the deleted value. 393 394And more miscellaneous functions: 395 396 void hv_clear(HV*); 397 void hv_undef(HV*); 398 399Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 400table but does not actually delete the hash table. The C<hv_undef> deletes 401both the entries and the hash table itself. 402 403Perl keeps the actual data in linked list of structures with a typedef of HE. 404These contain the actual key and value pointers (plus extra administrative 405overhead). The key is a string pointer; the value is an C<SV*>. However, 406once you have an C<HE*>, to get the actual key and value, use the routines 407specified below. 408 409 I32 hv_iterinit(HV*); 410 /* Prepares starting point to traverse hash table */ 411 HE* hv_iternext(HV*); 412 /* Get the next entry, and return a pointer to a 413 structure that has both the key and value */ 414 char* hv_iterkey(HE* entry, I32* retlen); 415 /* Get the key from an HE structure and also return 416 the length of the key string */ 417 SV* hv_iterval(HV*, HE* entry); 418 /* Return an SV pointer to the value of the HE 419 structure */ 420 SV* hv_iternextsv(HV*, char** key, I32* retlen); 421 /* This convenience routine combines hv_iternext, 422 hv_iterkey, and hv_iterval. The key and retlen 423 arguments are return values for the key and its 424 length. The value is returned in the SV* argument */ 425 426If you know the name of a hash variable, you can get a pointer to its HV 427by using the following: 428 429 HV* get_hv("package::varname", FALSE); 430 431This returns NULL if the variable does not exist. 432 433The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro: 434 435 hash = 0; 436 while (klen--) 437 hash = (hash * 33) + *key++; 438 hash = hash + (hash >> 5); /* after 5.6 */ 439 440The last step was added in version 5.6 to improve distribution of 441lower bits in the resulting hash value. 442 443See L<Understanding the Magic of Tied Hashes and Arrays> for more 444information on how to use the hash access functions on tied hashes. 445 446=head2 Hash API Extensions 447 448Beginning with version 5.004, the following functions are also supported: 449 450 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 451 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 452 453 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 454 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 455 456 SV* hv_iterkeysv (HE* entry); 457 458Note that these functions take C<SV*> keys, which simplifies writing 459of extension code that deals with hash structures. These functions 460also allow passing of C<SV*> keys to C<tie> functions without forcing 461you to stringify the keys (unlike the previous set of functions). 462 463They also return and accept whole hash entries (C<HE*>), making their 464use more efficient (since the hash number for a particular string 465doesn't have to be recomputed every time). See L<perlapi> for detailed 466descriptions. 467 468The following macros must always be used to access the contents of hash 469entries. Note that the arguments to these macros must be simple 470variables, since they may get evaluated more than once. See 471L<perlapi> for detailed descriptions of these macros. 472 473 HePV(HE* he, STRLEN len) 474 HeVAL(HE* he) 475 HeHASH(HE* he) 476 HeSVKEY(HE* he) 477 HeSVKEY_force(HE* he) 478 HeSVKEY_set(HE* he, SV* sv) 479 480These two lower level macros are defined, but must only be used when 481dealing with keys that are not C<SV*>s: 482 483 HeKEY(HE* he) 484 HeKLEN(HE* he) 485 486Note that both C<hv_store> and C<hv_store_ent> do not increment the 487reference count of the stored C<val>, which is the caller's responsibility. 488If these functions return a NULL value, the caller will usually have to 489decrement the reference count of C<val> to avoid a memory leak. 490 491=head2 References 492 493References are a special type of scalar that point to other data types 494(including references). 495 496To create a reference, use either of the following functions: 497 498 SV* newRV_inc((SV*) thing); 499 SV* newRV_noinc((SV*) thing); 500 501The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 502functions are identical except that C<newRV_inc> increments the reference 503count of the C<thing>, while C<newRV_noinc> does not. For historical 504reasons, C<newRV> is a synonym for C<newRV_inc>. 505 506Once you have a reference, you can use the following macro to dereference 507the reference: 508 509 SvRV(SV*) 510 511then call the appropriate routines, casting the returned C<SV*> to either an 512C<AV*> or C<HV*>, if required. 513 514To determine if an SV is a reference, you can use the following macro: 515 516 SvROK(SV*) 517 518To discover what type of value the reference refers to, use the following 519macro and then check the return value. 520 521 SvTYPE(SvRV(SV*)) 522 523The most useful types that will be returned are: 524 525 SVt_IV Scalar 526 SVt_NV Scalar 527 SVt_PV Scalar 528 SVt_RV Scalar 529 SVt_PVAV Array 530 SVt_PVHV Hash 531 SVt_PVCV Code 532 SVt_PVGV Glob (possible a file handle) 533 SVt_PVMG Blessed or Magical Scalar 534 535 See the sv.h header file for more details. 536 537=head2 Blessed References and Class Objects 538 539References are also used to support object-oriented programming. In the 540OO lexicon, an object is simply a reference that has been blessed into a 541package (or class). Once blessed, the programmer may now use the reference 542to access the various methods in the class. 543 544A reference can be blessed into a package with the following function: 545 546 SV* sv_bless(SV* sv, HV* stash); 547 548The C<sv> argument must be a reference. The C<stash> argument specifies 549which class the reference will belong to. See 550L<Stashes and Globs> for information on converting class names into stashes. 551 552/* Still under construction */ 553 554Upgrades rv to reference if not already one. Creates new SV for rv to 555point to. If C<classname> is non-null, the SV is blessed into the specified 556class. SV is returned. 557 558 SV* newSVrv(SV* rv, const char* classname); 559 560Copies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed 561if C<classname> is non-null. 562 563 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 564 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 565 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 566 567Copies the pointer value (I<the address, not the string!>) into an SV whose 568reference is rv. SV is blessed if C<classname> is non-null. 569 570 SV* sv_setref_pv(SV* rv, const char* classname, PV iv); 571 572Copies string into an SV whose reference is C<rv>. Set length to 0 to let 573Perl calculate the string length. SV is blessed if C<classname> is non-null. 574 575 SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length); 576 577Tests whether the SV is blessed into the specified class. It does not 578check inheritance relationships. 579 580 int sv_isa(SV* sv, const char* name); 581 582Tests whether the SV is a reference to a blessed object. 583 584 int sv_isobject(SV* sv); 585 586Tests whether the SV is derived from the specified class. SV can be either 587a reference to a blessed object or a string containing a class name. This 588is the function implementing the C<UNIVERSAL::isa> functionality. 589 590 bool sv_derived_from(SV* sv, const char* name); 591 592To check if you've got an object derived from a specific class you have 593to write: 594 595 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 596 597=head2 Creating New Variables 598 599To create a new Perl variable with an undef value which can be accessed from 600your Perl script, use the following routines, depending on the variable type. 601 602 SV* get_sv("package::varname", TRUE); 603 AV* get_av("package::varname", TRUE); 604 HV* get_hv("package::varname", TRUE); 605 606Notice the use of TRUE as the second parameter. The new variable can now 607be set, using the routines appropriate to the data type. 608 609There are additional macros whose values may be bitwise OR'ed with the 610C<TRUE> argument to enable certain extra features. Those bits are: 611 612=over 613 614=item GV_ADDMULTI 615 616Marks the variable as multiply defined, thus preventing the: 617 618 Name <varname> used only once: possible typo 619 620warning. 621 622=item GV_ADDWARN 623 624Issues the warning: 625 626 Had to create <varname> unexpectedly 627 628if the variable did not exist before the function was called. 629 630=back 631 632If you do not specify a package name, the variable is created in the current 633package. 634 635=head2 Reference Counts and Mortality 636 637Perl uses a reference count-driven garbage collection mechanism. SVs, 638AVs, or HVs (xV for short in the following) start their life with a 639reference count of 1. If the reference count of an xV ever drops to 0, 640then it will be destroyed and its memory made available for reuse. 641 642This normally doesn't happen at the Perl level unless a variable is 643undef'ed or the last variable holding a reference to it is changed or 644overwritten. At the internal level, however, reference counts can be 645manipulated with the following macros: 646 647 int SvREFCNT(SV* sv); 648 SV* SvREFCNT_inc(SV* sv); 649 void SvREFCNT_dec(SV* sv); 650 651However, there is one other function which manipulates the reference 652count of its argument. The C<newRV_inc> function, you will recall, 653creates a reference to the specified argument. As a side effect, 654it increments the argument's reference count. If this is not what 655you want, use C<newRV_noinc> instead. 656 657For example, imagine you want to return a reference from an XSUB function. 658Inside the XSUB routine, you create an SV which initially has a reference 659count of one. Then you call C<newRV_inc>, passing it the just-created SV. 660This returns the reference as a new SV, but the reference count of the 661SV you passed to C<newRV_inc> has been incremented to two. Now you 662return the reference from the XSUB routine and forget about the SV. 663But Perl hasn't! Whenever the returned reference is destroyed, the 664reference count of the original SV is decreased to one and nothing happens. 665The SV will hang around without any way to access it until Perl itself 666terminates. This is a memory leak. 667 668The correct procedure, then, is to use C<newRV_noinc> instead of 669C<newRV_inc>. Then, if and when the last reference is destroyed, 670the reference count of the SV will go to zero and it will be destroyed, 671stopping any memory leak. 672 673There are some convenience functions available that can help with the 674destruction of xVs. These functions introduce the concept of "mortality". 675An xV that is mortal has had its reference count marked to be decremented, 676but not actually decremented, until "a short time later". Generally the 677term "short time later" means a single Perl statement, such as a call to 678an XSUB function. The actual determinant for when mortal xVs have their 679reference count decremented depends on two macros, SAVETMPS and FREETMPS. 680See L<perlcall> and L<perlxs> for more details on these macros. 681 682"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. 683However, if you mortalize a variable twice, the reference count will 684later be decremented twice. 685 686"Mortal" SVs are mainly used for SVs that are placed on perl's stack. 687For example an SV which is created just to pass a number to a called sub 688is made mortal to have it cleaned up automatically when stack is popped. 689Similarly results returned by XSUBs (which go in the stack) are often 690made mortal. 691 692To create a mortal variable, use the functions: 693 694 SV* sv_newmortal() 695 SV* sv_2mortal(SV*) 696 SV* sv_mortalcopy(SV*) 697 698The first call creates a mortal SV (with no value), the second converts an existing 699SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the 700third creates a mortal copy of an existing SV. 701Because C<sv_newmortal> gives the new SV no value,it must normally be given one 702via C<sv_setpv>, C<sv_setiv>, etc. : 703 704 SV *tmp = sv_newmortal(); 705 sv_setiv(tmp, an_integer); 706 707As that is multiple C statements it is quite common so see this idiom instead: 708 709 SV *tmp = sv_2mortal(newSViv(an_integer)); 710 711 712You should be careful about creating mortal variables. Strange things 713can happen if you make the same value mortal within multiple contexts, 714or if you make a variable mortal multiple times. Thinking of "Mortalization" 715as deferred C<SvREFCNT_dec> should help to minimize such problems. 716For example if you are passing an SV which you I<know> has high enough REFCNT 717to survive its use on the stack you need not do any mortalization. 718If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or 719making a C<sv_mortalcopy> is safer. 720 721The mortal routines are not just for SVs -- AVs and HVs can be 722made mortal by passing their address (type-casted to C<SV*>) to the 723C<sv_2mortal> or C<sv_mortalcopy> routines. 724 725=head2 Stashes and Globs 726 727A "stash" is a hash that contains all of the different objects that 728are contained within a package. Each key of the stash is a symbol 729name (shared by all the different types of objects that have the same 730name), and each value in the hash table is a GV (Glob Value). This GV 731in turn contains references to the various objects of that name, 732including (but not limited to) the following: 733 734 Scalar Value 735 Array Value 736 Hash Value 737 I/O Handle 738 Format 739 Subroutine 740 741There is a single stash called "PL_defstash" that holds the items that exist 742in the "main" package. To get at the items in other packages, append the 743string "::" to the package name. The items in the "Foo" package are in 744the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are 745in the stash "Baz::" in "Bar::"'s stash. 746 747To get the stash pointer for a particular package, use the function: 748 749 HV* gv_stashpv(const char* name, I32 create) 750 HV* gv_stashsv(SV*, I32 create) 751 752The first function takes a literal string, the second uses the string stored 753in the SV. Remember that a stash is just a hash table, so you get back an 754C<HV*>. The C<create> flag will create a new package if it is set. 755 756The name that C<gv_stash*v> wants is the name of the package whose symbol table 757you want. The default package is called C<main>. If you have multiply nested 758packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 759language itself. 760 761Alternately, if you have an SV that is a blessed reference, you can find 762out the stash pointer by using: 763 764 HV* SvSTASH(SvRV(SV*)); 765 766then use the following to get the package name itself: 767 768 char* HvNAME(HV* stash); 769 770If you need to bless or re-bless an object you can use the following 771function: 772 773 SV* sv_bless(SV*, HV* stash) 774 775where the first argument, an C<SV*>, must be a reference, and the second 776argument is a stash. The returned C<SV*> can now be used in the same way 777as any other SV. 778 779For more information on references and blessings, consult L<perlref>. 780 781=head2 Double-Typed SVs 782 783Scalar variables normally contain only one type of value, an integer, 784double, pointer, or reference. Perl will automatically convert the 785actual scalar data from the stored type into the requested type. 786 787Some scalar variables contain more than one type of scalar data. For 788example, the variable C<$!> contains either the numeric value of C<errno> 789or its string equivalent from either C<strerror> or C<sys_errlist[]>. 790 791To force multiple data values into an SV, you must do two things: use the 792C<sv_set*v> routines to add the additional scalar type, then set a flag 793so that Perl will believe it contains more than one type of data. The 794four macros to set the flags are: 795 796 SvIOK_on 797 SvNOK_on 798 SvPOK_on 799 SvROK_on 800 801The particular macro you must use depends on which C<sv_set*v> routine 802you called first. This is because every C<sv_set*v> routine turns on 803only the bit for the particular type of data being set, and turns off 804all the rest. 805 806For example, to create a new Perl variable called "dberror" that contains 807both the numeric and descriptive string error values, you could use the 808following code: 809 810 extern int dberror; 811 extern char *dberror_list; 812 813 SV* sv = get_sv("dberror", TRUE); 814 sv_setiv(sv, (IV) dberror); 815 sv_setpv(sv, dberror_list[dberror]); 816 SvIOK_on(sv); 817 818If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 819macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 820 821=head2 Magic Variables 822 823[This section still under construction. Ignore everything here. Post no 824bills. Everything not permitted is forbidden.] 825 826Any SV may be magical, that is, it has special features that a normal 827SV does not have. These features are stored in the SV structure in a 828linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 829 830 struct magic { 831 MAGIC* mg_moremagic; 832 MGVTBL* mg_virtual; 833 U16 mg_private; 834 char mg_type; 835 U8 mg_flags; 836 SV* mg_obj; 837 char* mg_ptr; 838 I32 mg_len; 839 }; 840 841Note this is current as of patchlevel 0, and could change at any time. 842 843=head2 Assigning Magic 844 845Perl adds magic to an SV using the sv_magic function: 846 847 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 848 849The C<sv> argument is a pointer to the SV that is to acquire a new magical 850feature. 851 852If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 853convert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic 854to the beginning of the linked list of magical features. Any prior entry 855of the same type of magic is deleted. Note that this can be overridden, 856and multiple instances of the same type of magic can be associated with an 857SV. 858 859The C<name> and C<namlen> arguments are used to associate a string with 860the magic, typically the name of a variable. C<namlen> is stored in the 861C<mg_len> field and if C<name> is non-null and C<namlen> E<gt>= 0 a malloc'd 862copy of the name is stored in C<mg_ptr> field. 863 864The sv_magic function uses C<how> to determine which, if any, predefined 865"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 866See the "Magic Virtual Table" section below. The C<how> argument is also 867stored in the C<mg_type> field. The value of C<how> should be chosen 868from the set of macros C<PERL_MAGIC_foo> found perl.h. Note that before 869these macros were added, Perl internals used to directly use character 870literals, so you may occasionally come across old code or documentation 871referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 872 873The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 874structure. If it is not the same as the C<sv> argument, the reference 875count of the C<obj> object is incremented. If it is the same, or if 876the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer, 877then C<obj> is merely stored, without the reference count being incremented. 878 879There is also a function to add magic to an C<HV>: 880 881 void hv_magic(HV *hv, GV *gv, int how); 882 883This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 884 885To remove the magic from an SV, call the function sv_unmagic: 886 887 void sv_unmagic(SV *sv, int type); 888 889The C<type> argument should be equal to the C<how> value when the C<SV> 890was initially made magical. 891 892=head2 Magic Virtual Tables 893 894The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 895C<MGVTBL>, which is a structure of function pointers and stands for 896"Magic Virtual Table" to handle the various operations that might be 897applied to that variable. 898 899The C<MGVTBL> has five pointers to the following routine types: 900 901 int (*svt_get)(SV* sv, MAGIC* mg); 902 int (*svt_set)(SV* sv, MAGIC* mg); 903 U32 (*svt_len)(SV* sv, MAGIC* mg); 904 int (*svt_clear)(SV* sv, MAGIC* mg); 905 int (*svt_free)(SV* sv, MAGIC* mg); 906 907This MGVTBL structure is set at compile-time in C<perl.h> and there are 908currently 19 types (or 21 with overloading turned on). These different 909structures contain pointers to various routines that perform additional 910actions depending on which function is being called. 911 912 Function pointer Action taken 913 ---------------- ------------ 914 svt_get Do something before the value of the SV is retrieved. 915 svt_set Do something after the SV is assigned a value. 916 svt_len Report on the SV's length. 917 svt_clear Clear something the SV represents. 918 svt_free Free any extra storage associated with the SV. 919 920For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 921to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 922 923 { magic_get, magic_set, magic_len, 0, 0 } 924 925Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 926if a get operation is being performed, the routine C<magic_get> is 927called. All the various routines for the various magical types begin 928with C<magic_>. NOTE: the magic routines are not considered part of 929the Perl API, and may not be exported by the Perl library. 930 931The current kinds of Magic Virtual Tables are: 932 933 mg_type 934 (old-style char and macro) MGVTBL Type of magic 935 -------------------------- ------ ---------------------------- 936 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 937 A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash 938 a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element 939 c PERL_MAGIC_overload_table (none) Holds overload table (AMT) 940 on stash 941 B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search) 942 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 943 (@+ and @- vars) 944 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 945 element 946 E PERL_MAGIC_env vtbl_env %ENV hash 947 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 948 f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format) 949 g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string 950 I PERL_MAGIC_isa vtbl_isa @ISA array 951 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 952 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 953 L PERL_MAGIC_dbfile (none) Debugger %_<filename 954 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element 955 m PERL_MAGIC_mutex vtbl_mutex ??? 956 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation 957 P PERL_MAGIC_tied vtbl_pack Tied array or hash 958 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 959 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 960 r PERL_MAGIC_qr vtbl_qr precompiled qr// regex 961 S PERL_MAGIC_sig vtbl_sig %SIG hash 962 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 963 t PERL_MAGIC_taint vtbl_taint Taintedness 964 U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions 965 v PERL_MAGIC_vec vtbl_vec vec() lvalue 966 x PERL_MAGIC_substr vtbl_substr substr() lvalue 967 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 968 variable / smart parameter 969 vivification 970 * PERL_MAGIC_glob vtbl_glob GV (typeglob) 971 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 972 . PERL_MAGIC_pos vtbl_pos pos() lvalue 973 < PERL_MAGIC_backref vtbl_backref ??? 974 ~ PERL_MAGIC_ext (none) Available for use by extensions 975 976When an uppercase and lowercase letter both exist in the table, then the 977uppercase letter is used to represent some kind of composite type (a list 978or a hash), and the lowercase letter is used to represent an element of 979that composite type. Some internals code makes use of this case 980relationship. 981 982The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined 983specifically for use by extensions and will not be used by perl itself. 984Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information 985to variables (typically objects). This is especially useful because 986there is no way for normal perl code to corrupt this private information 987(unlike using extra elements of a hash object). 988 989Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 990C function any time a scalar's value is used or changed. The C<MAGIC>'s 991C<mg_ptr> field points to a C<ufuncs> structure: 992 993 struct ufuncs { 994 I32 (*uf_val)(pTHX_ IV, SV*); 995 I32 (*uf_set)(pTHX_ IV, SV*); 996 IV uf_index; 997 }; 998 999When the SV is read from or written to, the C<uf_val> or C<uf_set> 1000function will be called with C<uf_index> as the first arg and a pointer to 1001the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1002magic is shown below. Note that the ufuncs structure is copied by 1003sv_magic, so you can safely allocate it on the stack. 1004 1005 void 1006 Umagic(sv) 1007 SV *sv; 1008 PREINIT: 1009 struct ufuncs uf; 1010 CODE: 1011 uf.uf_val = &my_get_fn; 1012 uf.uf_set = &my_set_fn; 1013 uf.uf_index = 0; 1014 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1015 1016Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1017or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1018extra care to avoid conflict. Typically only using the magic on 1019objects blessed into the same class as the extension is sufficient. 1020For C<PERL_MAGIC_ext> magic, it may also be appropriate to add an I32 1021'signature' at the top of the private data area and check that. 1022 1023Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1024earlier do B<not> invoke 'set' magic on their targets. This must 1025be done by the user either by calling the C<SvSETMAGIC()> macro after 1026calling these functions, or by using one of the C<sv_set*_mg()> or 1027C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1028C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1029obtained from external sources in functions that don't handle magic. 1030See L<perlapi> for a description of these functions. 1031For example, calls to the C<sv_cat*()> functions typically need to be 1032followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1033since their implementation handles 'get' magic. 1034 1035=head2 Finding Magic 1036 1037 MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */ 1038 1039This routine returns a pointer to the C<MAGIC> structure stored in the SV. 1040If the SV does not have that magical feature, C<NULL> is returned. Also, 1041if the SV is not of type SVt_PVMG, Perl may core dump. 1042 1043 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1044 1045This routine checks to see what types of magic C<sv> has. If the mg_type 1046field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1047the mg_type field is changed to be the lowercase letter. 1048 1049=head2 Understanding the Magic of Tied Hashes and Arrays 1050 1051Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1052magic type. 1053 1054WARNING: As of the 5.004 release, proper usage of the array and hash 1055access functions requires understanding a few caveats. Some 1056of these caveats are actually considered bugs in the API, to be fixed 1057in later releases, and are bracketed with [MAYCHANGE] below. If 1058you find yourself actually applying such information in this section, be 1059aware that the behavior may change in the future, umm, without warning. 1060 1061The perl tie function associates a variable with an object that implements 1062the various GET, SET, etc methods. To perform the equivalent of the perl 1063tie function from an XSUB, you must mimic this behaviour. The code below 1064carries out the necessary steps - firstly it creates a new hash, and then 1065creates a second hash which it blesses into the class which will implement 1066the tie methods. Lastly it ties the two hashes together, and returns a 1067reference to the new tied hash. Note that the code below does NOT call the 1068TIEHASH method in the MyTie class - 1069see L<Calling Perl Routines from within C Programs> for details on how 1070to do this. 1071 1072 SV* 1073 mytie() 1074 PREINIT: 1075 HV *hash; 1076 HV *stash; 1077 SV *tie; 1078 CODE: 1079 hash = newHV(); 1080 tie = newRV_noinc((SV*)newHV()); 1081 stash = gv_stashpv("MyTie", TRUE); 1082 sv_bless(tie, stash); 1083 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1084 RETVAL = newRV_noinc(hash); 1085 OUTPUT: 1086 RETVAL 1087 1088The C<av_store> function, when given a tied array argument, merely 1089copies the magic of the array onto the value to be "stored", using 1090C<mg_copy>. It may also return NULL, indicating that the value did not 1091actually need to be stored in the array. [MAYCHANGE] After a call to 1092C<av_store> on a tied array, the caller will usually need to call 1093C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1094TIEARRAY object. If C<av_store> did return NULL, a call to 1095C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1096leak. [/MAYCHANGE] 1097 1098The previous paragraph is applicable verbatim to tied hash access using the 1099C<hv_store> and C<hv_store_ent> functions as well. 1100 1101C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1102C<hv_fetch_ent> actually return an undefined mortal value whose magic 1103has been initialized using C<mg_copy>. Note the value so returned does not 1104need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1105need to call C<mg_get()> on the returned value in order to actually invoke 1106the perl level "FETCH" method on the underlying TIE object. Similarly, 1107you may also call C<mg_set()> on the return value after possibly assigning 1108a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1109method on the TIE object. [/MAYCHANGE] 1110 1111[MAYCHANGE] 1112In other words, the array or hash fetch/store functions don't really 1113fetch and store actual values in the case of tied arrays and hashes. They 1114merely call C<mg_copy> to attach magic to the values that were meant to be 1115"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1116do the job of invoking the TIE methods on the underlying objects. Thus 1117the magic mechanism currently implements a kind of lazy access to arrays 1118and hashes. 1119 1120Currently (as of perl version 5.004), use of the hash and array access 1121functions requires the user to be aware of whether they are operating on 1122"normal" hashes and arrays, or on their tied variants. The API may be 1123changed to provide more transparent access to both tied and normal data 1124types in future versions. 1125[/MAYCHANGE] 1126 1127You would do well to understand that the TIEARRAY and TIEHASH interfaces 1128are mere sugar to invoke some perl method calls while using the uniform hash 1129and array syntax. The use of this sugar imposes some overhead (typically 1130about two to four extra opcodes per FETCH/STORE operation, in addition to 1131the creation of all the mortal variables required to invoke the methods). 1132This overhead will be comparatively small if the TIE methods are themselves 1133substantial, but if they are only a few statements long, the overhead 1134will not be insignificant. 1135 1136=head2 Localizing changes 1137 1138Perl has a very handy construction 1139 1140 { 1141 local $var = 2; 1142 ... 1143 } 1144 1145This construction is I<approximately> equivalent to 1146 1147 { 1148 my $oldvar = $var; 1149 $var = 2; 1150 ... 1151 $var = $oldvar; 1152 } 1153 1154The biggest difference is that the first construction would 1155reinstate the initial value of $var, irrespective of how control exits 1156the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1157more efficient as well. 1158 1159There is a way to achieve a similar task from C via Perl API: create a 1160I<pseudo-block>, and arrange for some changes to be automatically 1161undone at the end of it, either explicit, or via a non-local exit (via 1162die()). A I<block>-like construct is created by a pair of 1163C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1164Such a construct may be created specially for some important localized 1165task, or an existing one (like boundaries of enclosing Perl 1166subroutine/block, or an existing pair for freeing TMPs) may be 1167used. (In the second case the overhead of additional localization must 1168be almost negligible.) Note that any XSUB is automatically enclosed in 1169an C<ENTER>/C<LEAVE> pair. 1170 1171Inside such a I<pseudo-block> the following service is available: 1172 1173=over 4 1174 1175=item C<SAVEINT(int i)> 1176 1177=item C<SAVEIV(IV i)> 1178 1179=item C<SAVEI32(I32 i)> 1180 1181=item C<SAVELONG(long i)> 1182 1183These macros arrange things to restore the value of integer variable 1184C<i> at the end of enclosing I<pseudo-block>. 1185 1186=item C<SAVESPTR(s)> 1187 1188=item C<SAVEPPTR(p)> 1189 1190These macros arrange things to restore the value of pointers C<s> and 1191C<p>. C<s> must be a pointer of a type which survives conversion to 1192C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1193and back. 1194 1195=item C<SAVEFREESV(SV *sv)> 1196 1197The refcount of C<sv> would be decremented at the end of 1198I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1199mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1200extends the lifetime of C<sv> until the beginning of the next statement, 1201C<SAVEFREESV> extends it until the end of the enclosing scope. These 1202lifetimes can be wildly different. 1203 1204Also compare C<SAVEMORTALIZESV>. 1205 1206=item C<SAVEMORTALIZESV(SV *sv)> 1207 1208Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1209scope instead of decrementing its reference count. This usually has the 1210effect of keeping C<sv> alive until the statement that called the currently 1211live scope has finished executing. 1212 1213=item C<SAVEFREEOP(OP *op)> 1214 1215The C<OP *> is op_free()ed at the end of I<pseudo-block>. 1216 1217=item C<SAVEFREEPV(p)> 1218 1219The chunk of memory which is pointed to by C<p> is Safefree()ed at the 1220end of I<pseudo-block>. 1221 1222=item C<SAVECLEARSV(SV *sv)> 1223 1224Clears a slot in the current scratchpad which corresponds to C<sv> at 1225the end of I<pseudo-block>. 1226 1227=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1228 1229The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1230string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1231short-lived storage, the corresponding string may be reallocated like 1232this: 1233 1234 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1235 1236=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1237 1238At the end of I<pseudo-block> the function C<f> is called with the 1239only argument C<p>. 1240 1241=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1242 1243At the end of I<pseudo-block> the function C<f> is called with the 1244implicit context argument (if any), and C<p>. 1245 1246=item C<SAVESTACK_POS()> 1247 1248The current offset on the Perl internal stack (cf. C<SP>) is restored 1249at the end of I<pseudo-block>. 1250 1251=back 1252 1253The following API list contains functions, thus one needs to 1254provide pointers to the modifiable data explicitly (either C pointers, 1255or Perlish C<GV *>s). Where the above macros take C<int>, a similar 1256function takes C<int *>. 1257 1258=over 4 1259 1260=item C<SV* save_scalar(GV *gv)> 1261 1262Equivalent to Perl code C<local $gv>. 1263 1264=item C<AV* save_ary(GV *gv)> 1265 1266=item C<HV* save_hash(GV *gv)> 1267 1268Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 1269 1270=item C<void save_item(SV *item)> 1271 1272Duplicates the current value of C<SV>, on the exit from the current 1273C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> 1274using the stored value. 1275 1276=item C<void save_list(SV **sarg, I32 maxsarg)> 1277 1278A variant of C<save_item> which takes multiple arguments via an array 1279C<sarg> of C<SV*> of length C<maxsarg>. 1280 1281=item C<SV* save_svref(SV **sptr)> 1282 1283Similar to C<save_scalar>, but will reinstate an C<SV *>. 1284 1285=item C<void save_aptr(AV **aptr)> 1286 1287=item C<void save_hptr(HV **hptr)> 1288 1289Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 1290 1291=back 1292 1293The C<Alias> module implements localization of the basic types within the 1294I<caller's scope>. People who are interested in how to localize things in 1295the containing scope should take a look there too. 1296 1297=head1 Subroutines 1298 1299=head2 XSUBs and the Argument Stack 1300 1301The XSUB mechanism is a simple way for Perl programs to access C subroutines. 1302An XSUB routine will have a stack that contains the arguments from the Perl 1303program, and a way to map from the Perl data structures to a C equivalent. 1304 1305The stack arguments are accessible through the C<ST(n)> macro, which returns 1306the C<n>'th stack argument. Argument 0 is the first argument passed in the 1307Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 1308an C<SV*> is used. 1309 1310Most of the time, output from the C routine can be handled through use of 1311the RETVAL and OUTPUT directives. However, there are some cases where the 1312argument stack is not already long enough to handle all the return values. 1313An example is the POSIX tzname() call, which takes no arguments, but returns 1314two, the local time zone's standard and summer time abbreviations. 1315 1316To handle this situation, the PPCODE directive is used and the stack is 1317extended using the macro: 1318 1319 EXTEND(SP, num); 1320 1321where C<SP> is the macro that represents the local copy of the stack pointer, 1322and C<num> is the number of elements the stack should be extended by. 1323 1324Now that there is room on the stack, values can be pushed on it using C<PUSHs> 1325macro. The values pushed will often need to be "mortal" (See L</Reference Counts and Mortality>). 1326 1327 PUSHs(sv_2mortal(newSViv(an_integer))) 1328 PUSHs(sv_2mortal(newSVpv("Some String",0))) 1329 PUSHs(sv_2mortal(newSVnv(3.141592))) 1330 1331And now the Perl program calling C<tzname>, the two values will be assigned 1332as in: 1333 1334 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 1335 1336An alternate (and possibly simpler) method to pushing values on the stack is 1337to use the macro: 1338 1339 XPUSHs(SV*) 1340 1341This macro automatically adjust the stack for you, if needed. Thus, you 1342do not need to call C<EXTEND> to extend the stack. 1343 1344Despite their suggestions in earlier versions of this document the macros 1345C<PUSHi>, C<PUSHn> and C<PUSHp> are I<not> suited to XSUBs which return 1346multiple results, see L</Putting a C value on Perl stack>. 1347 1348For more information, consult L<perlxs> and L<perlxstut>. 1349 1350=head2 Calling Perl Routines from within C Programs 1351 1352There are four routines that can be used to call a Perl subroutine from 1353within a C program. These four are: 1354 1355 I32 call_sv(SV*, I32); 1356 I32 call_pv(const char*, I32); 1357 I32 call_method(const char*, I32); 1358 I32 call_argv(const char*, I32, register char**); 1359 1360The routine most often used is C<call_sv>. The C<SV*> argument 1361contains either the name of the Perl subroutine to be called, or a 1362reference to the subroutine. The second argument consists of flags 1363that control the context in which the subroutine is called, whether 1364or not the subroutine is being passed arguments, how errors should be 1365trapped, and how to treat return values. 1366 1367All four routines return the number of arguments that the subroutine returned 1368on the Perl stack. 1369 1370These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 1371but those names are now deprecated; macros of the same name are provided for 1372compatibility. 1373 1374When using any of these routines (except C<call_argv>), the programmer 1375must manipulate the Perl stack. These include the following macros and 1376functions: 1377 1378 dSP 1379 SP 1380 PUSHMARK() 1381 PUTBACK 1382 SPAGAIN 1383 ENTER 1384 SAVETMPS 1385 FREETMPS 1386 LEAVE 1387 XPUSH*() 1388 POP*() 1389 1390For a detailed description of calling conventions from C to Perl, 1391consult L<perlcall>. 1392 1393=head2 Memory Allocation 1394 1395All memory meant to be used with the Perl API functions should be manipulated 1396using the macros described in this section. The macros provide the necessary 1397transparency between differences in the actual malloc implementation that is 1398used within perl. 1399 1400It is suggested that you enable the version of malloc that is distributed 1401with Perl. It keeps pools of various sizes of unallocated memory in 1402order to satisfy allocation requests more quickly. However, on some 1403platforms, it may cause spurious malloc or free errors. 1404 1405 New(x, pointer, number, type); 1406 Newc(x, pointer, number, type, cast); 1407 Newz(x, pointer, number, type); 1408 1409These three macros are used to initially allocate memory. 1410 1411The first argument C<x> was a "magic cookie" that was used to keep track 1412of who called the macro, to help when debugging memory problems. However, 1413the current code makes no use of this feature (most Perl developers now 1414use run-time memory checkers), so this argument can be any number. 1415 1416The second argument C<pointer> should be the name of a variable that will 1417point to the newly allocated memory. 1418 1419The third and fourth arguments C<number> and C<type> specify how many of 1420the specified type of data structure should be allocated. The argument 1421C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>, 1422should be used if the C<pointer> argument is different from the C<type> 1423argument. 1424 1425Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero> 1426to zero out all the newly allocated memory. 1427 1428 Renew(pointer, number, type); 1429 Renewc(pointer, number, type, cast); 1430 Safefree(pointer) 1431 1432These three macros are used to change a memory buffer size or to free a 1433piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 1434match those of C<New> and C<Newc> with the exception of not needing the 1435"magic cookie" argument. 1436 1437 Move(source, dest, number, type); 1438 Copy(source, dest, number, type); 1439 Zero(dest, number, type); 1440 1441These three macros are used to move, copy, or zero out previously allocated 1442memory. The C<source> and C<dest> arguments point to the source and 1443destination starting points. Perl will move, copy, or zero out C<number> 1444instances of the size of the C<type> data structure (using the C<sizeof> 1445function). 1446 1447=head2 PerlIO 1448 1449The most recent development releases of Perl has been experimenting with 1450removing Perl's dependency on the "normal" standard I/O suite and allowing 1451other stdio implementations to be used. This involves creating a new 1452abstraction layer that then calls whichever implementation of stdio Perl 1453was compiled with. All XSUBs should now use the functions in the PerlIO 1454abstraction layer and not make any assumptions about what kind of stdio 1455is being used. 1456 1457For a complete description of the PerlIO abstraction, consult L<perlapio>. 1458 1459=head2 Putting a C value on Perl stack 1460 1461A lot of opcodes (this is an elementary operation in the internal perl 1462stack machine) put an SV* on the stack. However, as an optimization 1463the corresponding SV is (usually) not recreated each time. The opcodes 1464reuse specially assigned SVs (I<target>s) which are (as a corollary) 1465not constantly freed/created. 1466 1467Each of the targets is created only once (but see 1468L<Scratchpads and recursion> below), and when an opcode needs to put 1469an integer, a double, or a string on stack, it just sets the 1470corresponding parts of its I<target> and puts the I<target> on stack. 1471 1472The macro to put this target on stack is C<PUSHTARG>, and it is 1473directly used in some opcodes, as well as indirectly in zillions of 1474others, which use it via C<(X)PUSH[pni]>. 1475 1476Because the target is reused, you must be careful when pushing multiple 1477values on the stack. The following code will not do what you think: 1478 1479 XPUSHi(10); 1480 XPUSHi(20); 1481 1482This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 1483the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 1484At the end of the operation, the stack does not contain the values 10 1485and 20, but actually contains two pointers to C<TARG>, which we have set 1486to 20. If you need to push multiple different values, use C<XPUSHs>, 1487which bypasses C<TARG>. 1488 1489On a related note, if you do use C<(X)PUSH[npi]>, then you're going to 1490need a C<dTARG> in your variable declarations so that the C<*PUSH*> 1491macros can make use of the local variable C<TARG>. 1492 1493=head2 Scratchpads 1494 1495The question remains on when the SVs which are I<target>s for opcodes 1496are created. The answer is that they are created when the current unit -- 1497a subroutine or a file (for opcodes for statements outside of 1498subroutines) -- is compiled. During this time a special anonymous Perl 1499array is created, which is called a scratchpad for the current 1500unit. 1501 1502A scratchpad keeps SVs which are lexicals for the current unit and are 1503targets for opcodes. One can deduce that an SV lives on a scratchpad 1504by looking on its flags: lexicals have C<SVs_PADMY> set, and 1505I<target>s have C<SVs_PADTMP> set. 1506 1507The correspondence between OPs and I<target>s is not 1-to-1. Different 1508OPs in the compile tree of the unit can use the same target, if this 1509would not conflict with the expected life of the temporary. 1510 1511=head2 Scratchpads and recursion 1512 1513In fact it is not 100% true that a compiled unit contains a pointer to 1514the scratchpad AV. In fact it contains a pointer to an AV of 1515(initially) one element, and this element is the scratchpad AV. Why do 1516we need an extra level of indirection? 1517 1518The answer is B<recursion>, and maybe B<threads>. Both 1519these can create several execution pointers going into the same 1520subroutine. For the subroutine-child not write over the temporaries 1521for the subroutine-parent (lifespan of which covers the call to the 1522child), the parent and the child should have different 1523scratchpads. (I<And> the lexicals should be separate anyway!) 1524 1525So each subroutine is born with an array of scratchpads (of length 1). 1526On each entry to the subroutine it is checked that the current 1527depth of the recursion is not more than the length of this array, and 1528if it is, new scratchpad is created and pushed into the array. 1529 1530The I<target>s on this scratchpad are C<undef>s, but they are already 1531marked with correct flags. 1532 1533=head1 Compiled code 1534 1535=head2 Code tree 1536 1537Here we describe the internal form your code is converted to by 1538Perl. Start with a simple example: 1539 1540 $a = $b + $c; 1541 1542This is converted to a tree similar to this one: 1543 1544 assign-to 1545 / \ 1546 + $a 1547 / \ 1548 $b $c 1549 1550(but slightly more complicated). This tree reflects the way Perl 1551parsed your code, but has nothing to do with the execution order. 1552There is an additional "thread" going through the nodes of the tree 1553which shows the order of execution of the nodes. In our simplified 1554example above it looks like: 1555 1556 $b ---> $c ---> + ---> $a ---> assign-to 1557 1558But with the actual compile tree for C<$a = $b + $c> it is different: 1559some nodes I<optimized away>. As a corollary, though the actual tree 1560contains more nodes than our simplified example, the execution order 1561is the same as in our example. 1562 1563=head2 Examining the tree 1564 1565If you have your perl compiled for debugging (usually done with C<-D 1566optimize=-g> on C<Configure> command line), you may examine the 1567compiled tree by specifying C<-Dx> on the Perl command line. The 1568output takes several lines per node, and for C<$b+$c> it looks like 1569this: 1570 1571 5 TYPE = add ===> 6 1572 TARG = 1 1573 FLAGS = (SCALAR,KIDS) 1574 { 1575 TYPE = null ===> (4) 1576 (was rv2sv) 1577 FLAGS = (SCALAR,KIDS) 1578 { 1579 3 TYPE = gvsv ===> 4 1580 FLAGS = (SCALAR) 1581 GV = main::b 1582 } 1583 } 1584 { 1585 TYPE = null ===> (5) 1586 (was rv2sv) 1587 FLAGS = (SCALAR,KIDS) 1588 { 1589 4 TYPE = gvsv ===> 5 1590 FLAGS = (SCALAR) 1591 GV = main::c 1592 } 1593 } 1594 1595This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 1596not optimized away (one per number in the left column). The immediate 1597children of the given node correspond to C<{}> pairs on the same level 1598of indentation, thus this listing corresponds to the tree: 1599 1600 add 1601 / \ 1602 null null 1603 | | 1604 gvsv gvsv 1605 1606The execution order is indicated by C<===E<gt>> marks, thus it is C<3 16074 5 6> (node C<6> is not included into above listing), i.e., 1608C<gvsv gvsv add whatever>. 1609 1610Each of these nodes represents an op, a fundamental operation inside the 1611Perl core. The code which implements each operation can be found in the 1612F<pp*.c> files; the function which implements the op with type C<gvsv> 1613is C<pp_gvsv>, and so on. As the tree above shows, different ops have 1614different numbers of children: C<add> is a binary operator, as one would 1615expect, and so has two children. To accommodate the various different 1616numbers of children, there are various types of op data structure, and 1617they link together in different ways. 1618 1619The simplest type of op structure is C<OP>: this has no children. Unary 1620operators, C<UNOP>s, have one child, and this is pointed to by the 1621C<op_first> field. Binary operators (C<BINOP>s) have not only an 1622C<op_first> field but also an C<op_last> field. The most complex type of 1623op is a C<LISTOP>, which has any number of children. In this case, the 1624first child is pointed to by C<op_first> and the last child by 1625C<op_last>. The children in between can be found by iteratively 1626following the C<op_sibling> pointer from the first child to the last. 1627 1628There are also two other op types: a C<PMOP> holds a regular expression, 1629and has no children, and a C<LOOP> may or may not have children. If the 1630C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 1631complicate matters, if a C<UNOP> is actually a C<null> op after 1632optimization (see L</Compile pass 2: context propagation>) it will still 1633have children in accordance with its former type. 1634 1635=head2 Compile pass 1: check routines 1636 1637The tree is created by the compiler while I<yacc> code feeds it 1638the constructions it recognizes. Since I<yacc> works bottom-up, so does 1639the first pass of perl compilation. 1640 1641What makes this pass interesting for perl developers is that some 1642optimization may be performed on this pass. This is optimization by 1643so-called "check routines". The correspondence between node names 1644and corresponding check routines is described in F<opcode.pl> (do not 1645forget to run C<make regen_headers> if you modify this file). 1646 1647A check routine is called when the node is fully constructed except 1648for the execution-order thread. Since at this time there are no 1649back-links to the currently constructed node, one can do most any 1650operation to the top-level node, including freeing it and/or creating 1651new nodes above/below it. 1652 1653The check routine returns the node which should be inserted into the 1654tree (if the top-level node was not modified, check routine returns 1655its argument). 1656 1657By convention, check routines have names C<ck_*>. They are usually 1658called from C<new*OP> subroutines (or C<convert>) (which in turn are 1659called from F<perly.y>). 1660 1661=head2 Compile pass 1a: constant folding 1662 1663Immediately after the check routine is called the returned node is 1664checked for being compile-time executable. If it is (the value is 1665judged to be constant) it is immediately executed, and a I<constant> 1666node with the "return value" of the corresponding subtree is 1667substituted instead. The subtree is deleted. 1668 1669If constant folding was not performed, the execution-order thread is 1670created. 1671 1672=head2 Compile pass 2: context propagation 1673 1674When a context for a part of compile tree is known, it is propagated 1675down through the tree. At this time the context can have 5 values 1676(instead of 2 for runtime context): void, boolean, scalar, list, and 1677lvalue. In contrast with the pass 1 this pass is processed from top 1678to bottom: a node's context determines the context for its children. 1679 1680Additional context-dependent optimizations are performed at this time. 1681Since at this moment the compile tree contains back-references (via 1682"thread" pointers), nodes cannot be free()d now. To allow 1683optimized-away nodes at this stage, such nodes are null()ified instead 1684of free()ing (i.e. their type is changed to OP_NULL). 1685 1686=head2 Compile pass 3: peephole optimization 1687 1688After the compile tree for a subroutine (or for an C<eval> or a file) 1689is created, an additional pass over the code is performed. This pass 1690is neither top-down or bottom-up, but in the execution order (with 1691additional complications for conditionals). These optimizations are 1692done in the subroutine peep(). Optimizations performed at this stage 1693are subject to the same restrictions as in the pass 2. 1694 1695=head2 Pluggable runops 1696 1697The compile tree is executed in a runops function. There are two runops 1698functions in F<run.c>. C<Perl_runops_debug> is used with DEBUGGING and 1699C<Perl_runops_standard> is used otherwise. For fine control over the 1700execution of the compile tree it is possible to provide your own runops 1701function. 1702 1703It's probably best to copy one of the existing runops functions and 1704change it to suit your needs. Then, in the BOOT section of your XS 1705file, add the line: 1706 1707 PL_runops = my_runops; 1708 1709This function should be as efficient as possible to keep your programs 1710running as fast as possible. 1711 1712=head1 Examining internal data structures with the C<dump> functions 1713 1714To aid debugging, the source file F<dump.c> contains a number of 1715functions which produce formatted output of internal data structures. 1716 1717The most commonly used of these functions is C<Perl_sv_dump>; it's used 1718for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 1719C<sv_dump> to produce debugging output from Perl-space, so users of that 1720module should already be familiar with its format. 1721 1722C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 1723derivatives, and produces output similar to C<perl -Dx>; in fact, 1724C<Perl_dump_eval> will dump the main root of the code being evaluated, 1725exactly like C<-Dx>. 1726 1727Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 1728op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 1729subroutines in a package like so: (Thankfully, these are all xsubs, so 1730there is no op tree) 1731 1732 (gdb) print Perl_dump_packsubs(PL_defstash) 1733 1734 SUB attributes::bootstrap = (xsub 0x811fedc 0) 1735 1736 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 1737 1738 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 1739 1740 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 1741 1742 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 1743 1744and C<Perl_dump_all>, which dumps all the subroutines in the stash and 1745the op tree of the main root. 1746 1747=head1 How multiple interpreters and concurrency are supported 1748 1749=head2 Background and PERL_IMPLICIT_CONTEXT 1750 1751The Perl interpreter can be regarded as a closed box: it has an API 1752for feeding it code or otherwise making it do things, but it also has 1753functions for its own use. This smells a lot like an object, and 1754there are ways for you to build Perl so that you can have multiple 1755interpreters, with one interpreter represented either as a C structure, 1756or inside a thread-specific structure. These structures contain all 1757the context, the state of that interpreter. 1758 1759Two macros control the major Perl build flavors: MULTIPLICITY and 1760USE_5005THREADS. The MULTIPLICITY build has a C structure 1761that packages all the interpreter state, and there is a similar thread-specific 1762data structure under USE_5005THREADS. In both cases, 1763PERL_IMPLICIT_CONTEXT is also normally defined, and enables the 1764support for passing in a "hidden" first argument that represents all three 1765data structures. 1766 1767All this obviously requires a way for the Perl internal functions to be 1768either subroutines taking some kind of structure as the first 1769argument, or subroutines taking nothing as the first argument. To 1770enable these two very different ways of building the interpreter, 1771the Perl source (as it does in so many other situations) makes heavy 1772use of macros and subroutine naming conventions. 1773 1774First problem: deciding which functions will be public API functions and 1775which will be private. All functions whose names begin C<S_> are private 1776(think "S" for "secret" or "static"). All other functions begin with 1777"Perl_", but just because a function begins with "Perl_" does not mean it is 1778part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a 1779function is part of the API is to find its entry in L<perlapi>. 1780If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 1781think it should be (i.e., you need it for your extension), send mail via 1782L<perlbug> explaining why you think it should be. 1783 1784Second problem: there must be a syntax so that the same subroutine 1785declarations and calls can pass a structure as their first argument, 1786or pass nothing. To solve this, the subroutines are named and 1787declared in a particular way. Here's a typical start of a static 1788function used within the Perl guts: 1789 1790 STATIC void 1791 S_incline(pTHX_ char *s) 1792 1793STATIC becomes "static" in C, and may be #define'd to nothing in some 1794configurations in future. 1795 1796A public function (i.e. part of the internal API, but not necessarily 1797sanctioned for use in extensions) begins like this: 1798 1799 void 1800 Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv) 1801 1802C<pTHX_> is one of a number of macros (in perl.h) that hide the 1803details of the interpreter's context. THX stands for "thread", "this", 1804or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 1805The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 1806or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 1807their variants. 1808 1809When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no 1810first argument containing the interpreter's context. The trailing underscore 1811in the pTHX_ macro indicates that the macro expansion needs a comma 1812after the context argument because other arguments follow it. If 1813PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the 1814subroutine is not prototyped to take the extra argument. The form of the 1815macro without the trailing underscore is used when there are no additional 1816explicit arguments. 1817 1818When a core function calls another, it must pass the context. This 1819is normally hidden via macros. Consider C<sv_setsv>. It expands into 1820something like this: 1821 1822 ifdef PERL_IMPLICIT_CONTEXT 1823 define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b) 1824 /* can't do this for vararg functions, see below */ 1825 else 1826 define sv_setsv Perl_sv_setsv 1827 endif 1828 1829This works well, and means that XS authors can gleefully write: 1830 1831 sv_setsv(foo, bar); 1832 1833and still have it work under all the modes Perl could have been 1834compiled with. 1835 1836This doesn't work so cleanly for varargs functions, though, as macros 1837imply that the number of arguments is known in advance. Instead we 1838either need to spell them out fully, passing C<aTHX_> as the first 1839argument (the Perl core tends to do this with functions like 1840Perl_warner), or use a context-free version. 1841 1842The context-free version of Perl_warner is called 1843Perl_warner_nocontext, and does not take the extra argument. Instead 1844it does dTHX; to get the context from thread-local storage. We 1845C<#define warner Perl_warner_nocontext> so that extensions get source 1846compatibility at the expense of performance. (Passing an arg is 1847cheaper than grabbing it from thread-local storage.) 1848 1849You can ignore [pad]THXx when browsing the Perl headers/sources. 1850Those are strictly for use within the core. Extensions and embedders 1851need only be aware of [pad]THX. 1852 1853=head2 So what happened to dTHR? 1854 1855C<dTHR> was introduced in perl 5.005 to support the older thread model. 1856The older thread model now uses the C<THX> mechanism to pass context 1857pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 1858later still have it for backward source compatibility, but it is defined 1859to be a no-op. 1860 1861=head2 How do I use all this in extensions? 1862 1863When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call 1864any functions in the Perl API will need to pass the initial context 1865argument somehow. The kicker is that you will need to write it in 1866such a way that the extension still compiles when Perl hasn't been 1867built with PERL_IMPLICIT_CONTEXT enabled. 1868 1869There are three ways to do this. First, the easy but inefficient way, 1870which is also the default, in order to maintain source compatibility 1871with extensions: whenever XSUB.h is #included, it redefines the aTHX 1872and aTHX_ macros to call a function that will return the context. 1873Thus, something like: 1874 1875 sv_setsv(asv, bsv); 1876 1877in your extension will translate to this when PERL_IMPLICIT_CONTEXT is 1878in effect: 1879 1880 Perl_sv_setsv(Perl_get_context(), asv, bsv); 1881 1882or to this otherwise: 1883 1884 Perl_sv_setsv(asv, bsv); 1885 1886You have to do nothing new in your extension to get this; since 1887the Perl library provides Perl_get_context(), it will all just 1888work. 1889 1890The second, more efficient way is to use the following template for 1891your Foo.xs: 1892 1893 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 1894 #include "EXTERN.h" 1895 #include "perl.h" 1896 #include "XSUB.h" 1897 1898 static my_private_function(int arg1, int arg2); 1899 1900 static SV * 1901 my_private_function(int arg1, int arg2) 1902 { 1903 dTHX; /* fetch context */ 1904 ... call many Perl API functions ... 1905 } 1906 1907 [... etc ...] 1908 1909 MODULE = Foo PACKAGE = Foo 1910 1911 /* typical XSUB */ 1912 1913 void 1914 my_xsub(arg) 1915 int arg 1916 CODE: 1917 my_private_function(arg, 10); 1918 1919Note that the only two changes from the normal way of writing an 1920extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 1921including the Perl headers, followed by a C<dTHX;> declaration at 1922the start of every function that will call the Perl API. (You'll 1923know which functions need this, because the C compiler will complain 1924that there's an undeclared identifier in those functions.) No changes 1925are needed for the XSUBs themselves, because the XS() macro is 1926correctly defined to pass in the implicit context if needed. 1927 1928The third, even more efficient way is to ape how it is done within 1929the Perl guts: 1930 1931 1932 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 1933 #include "EXTERN.h" 1934 #include "perl.h" 1935 #include "XSUB.h" 1936 1937 /* pTHX_ only needed for functions that call Perl API */ 1938 static my_private_function(pTHX_ int arg1, int arg2); 1939 1940 static SV * 1941 my_private_function(pTHX_ int arg1, int arg2) 1942 { 1943 /* dTHX; not needed here, because THX is an argument */ 1944 ... call Perl API functions ... 1945 } 1946 1947 [... etc ...] 1948 1949 MODULE = Foo PACKAGE = Foo 1950 1951 /* typical XSUB */ 1952 1953 void 1954 my_xsub(arg) 1955 int arg 1956 CODE: 1957 my_private_function(aTHX_ arg, 10); 1958 1959This implementation never has to fetch the context using a function 1960call, since it is always passed as an extra argument. Depending on 1961your needs for simplicity or efficiency, you may mix the previous 1962two approaches freely. 1963 1964Never add a comma after C<pTHX> yourself--always use the form of the 1965macro with the underscore for functions that take explicit arguments, 1966or the form without the argument for functions with no explicit arguments. 1967 1968=head2 Should I do anything special if I call perl from multiple threads? 1969 1970If you create interpreters in one thread and then proceed to call them in 1971another, you need to make sure perl's own Thread Local Storage (TLS) slot is 1972initialized correctly in each of those threads. 1973 1974The C<perl_alloc> and C<perl_clone> API functions will automatically set 1975the TLS slot to the interpreter they created, so that there is no need to do 1976anything special if the interpreter is always accessed in the same thread that 1977created it, and that thread did not create or call any other interpreters 1978afterwards. If that is not the case, you have to set the TLS slot of the 1979thread before calling any functions in the Perl API on that particular 1980interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 1981thread as the first thing you do: 1982 1983 /* do this before doing anything else with some_perl */ 1984 PERL_SET_CONTEXT(some_perl); 1985 1986 ... other Perl API calls on some_perl go here ... 1987 1988=head2 Future Plans and PERL_IMPLICIT_SYS 1989 1990Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything 1991that the interpreter knows about itself and pass it around, so too are 1992there plans to allow the interpreter to bundle up everything it knows 1993about the environment it's running on. This is enabled with the 1994PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS 1995and USE_5005THREADS on Windows (see inside iperlsys.h). 1996 1997This allows the ability to provide an extra pointer (called the "host" 1998environment) for all the system calls. This makes it possible for 1999all the system stuff to maintain their own state, broken down into 2000seven C structures. These are thin wrappers around the usual system 2001calls (see win32/perllib.c) for the default perl executable, but for a 2002more ambitious host (like the one that would do fork() emulation) all 2003the extra work needed to pretend that different interpreters are 2004actually different "processes", would be done here. 2005 2006The Perl engine/interpreter and the host are orthogonal entities. 2007There could be one or more interpreters in a process, and one or 2008more "hosts", with free association between them. 2009 2010=head1 Internal Functions 2011 2012All of Perl's internal functions which will be exposed to the outside 2013world are be prefixed by C<Perl_> so that they will not conflict with XS 2014functions or functions used in a program in which Perl is embedded. 2015Similarly, all global variables begin with C<PL_>. (By convention, 2016static functions start with C<S_>) 2017 2018Inside the Perl core, you can get at the functions either with or 2019without the C<Perl_> prefix, thanks to a bunch of defines that live in 2020F<embed.h>. This header file is generated automatically from 2021F<embed.pl>. F<embed.pl> also creates the prototyping header files for 2022the internal functions, generates the documentation and a lot of other 2023bits and pieces. It's important that when you add a new function to the 2024core or change an existing one, you change the data in the table at the 2025end of F<embed.pl> as well. Here's a sample entry from that table: 2026 2027 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 2028 2029The second column is the return type, the third column the name. Columns 2030after that are the arguments. The first column is a set of flags: 2031 2032=over 3 2033 2034=item A 2035 2036This function is a part of the public API. 2037 2038=item p 2039 2040This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch> 2041 2042=item d 2043 2044This function has documentation using the C<apidoc> feature which we'll 2045look at in a second. 2046 2047=back 2048 2049Other available flags are: 2050 2051=over 3 2052 2053=item s 2054 2055This is a static function and is defined as C<S_whatever>, and usually 2056called within the sources as C<whatever(...)>. 2057 2058=item n 2059 2060This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See 2061L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.) 2062 2063=item r 2064 2065This function never returns; C<croak>, C<exit> and friends. 2066 2067=item f 2068 2069This function takes a variable number of arguments, C<printf> style. 2070The argument list should end with C<...>, like this: 2071 2072 Afprd |void |croak |const char* pat|... 2073 2074=item M 2075 2076This function is part of the experimental development API, and may change 2077or disappear without notice. 2078 2079=item o 2080 2081This function should not have a compatibility macro to define, say, 2082C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. 2083 2084=item j 2085 2086This function is not a member of C<CPerlObj>. If you don't know 2087what this means, don't use it. 2088 2089=item x 2090 2091This function isn't exported out of the Perl core. 2092 2093=back 2094 2095If you edit F<embed.pl>, you will need to run C<make regen_headers> to 2096force a rebuild of F<embed.h> and other auto-generated files. 2097 2098=head2 Formatted Printing of IVs, UVs, and NVs 2099 2100If you are printing IVs, UVs, or NVS instead of the stdio(3) style 2101formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 2102following macros for portability 2103 2104 IVdf IV in decimal 2105 UVuf UV in decimal 2106 UVof UV in octal 2107 UVxf UV in hexadecimal 2108 NVef NV %e-like 2109 NVff NV %f-like 2110 NVgf NV %g-like 2111 2112These will take care of 64-bit integers and long doubles. 2113For example: 2114 2115 printf("IV is %"IVdf"\n", iv); 2116 2117The IVdf will expand to whatever is the correct format for the IVs. 2118 2119If you are printing addresses of pointers, use UVxf combined 2120with PTR2UV(), do not use %lx or %p. 2121 2122=head2 Pointer-To-Integer and Integer-To-Pointer 2123 2124Because pointer size does not necessarily equal integer size, 2125use the follow macros to do it right. 2126 2127 PTR2UV(pointer) 2128 PTR2IV(pointer) 2129 PTR2NV(pointer) 2130 INT2PTR(pointertotype, integer) 2131 2132For example: 2133 2134 IV iv = ...; 2135 SV *sv = INT2PTR(SV*, iv); 2136 2137and 2138 2139 AV *av = ...; 2140 UV uv = PTR2UV(av); 2141 2142=head2 Source Documentation 2143 2144There's an effort going on to document the internal functions and 2145automatically produce reference manuals from them - L<perlapi> is one 2146such manual which details all the functions which are available to XS 2147writers. L<perlintern> is the autogenerated manual for the functions 2148which are not part of the API and are supposedly for internal use only. 2149 2150Source documentation is created by putting POD comments into the C 2151source, like this: 2152 2153 /* 2154 =for apidoc sv_setiv 2155 2156 Copies an integer into the given SV. Does not handle 'set' magic. See 2157 C<sv_setiv_mg>. 2158 2159 =cut 2160 */ 2161 2162Please try and supply some documentation if you add functions to the 2163Perl core. 2164 2165=head1 Unicode Support 2166 2167Perl 5.6.0 introduced Unicode support. It's important for porters and XS 2168writers to understand this support and make sure that the code they 2169write does not corrupt Unicode data. 2170 2171=head2 What B<is> Unicode, anyway? 2172 2173In the olden, less enlightened times, we all used to use ASCII. Most of 2174us did, anyway. The big problem with ASCII is that it's American. Well, 2175no, that's not actually the problem; the problem is that it's not 2176particularly useful for people who don't use the Roman alphabet. What 2177used to happen was that particular languages would stick their own 2178alphabet in the upper range of the sequence, between 128 and 255. Of 2179course, we then ended up with plenty of variants that weren't quite 2180ASCII, and the whole point of it being a standard was lost. 2181 2182Worse still, if you've got a language like Chinese or 2183Japanese that has hundreds or thousands of characters, then you really 2184can't fit them into a mere 256, so they had to forget about ASCII 2185altogether, and build their own systems using pairs of numbers to refer 2186to one character. 2187 2188To fix this, some people formed Unicode, Inc. and 2189produced a new character set containing all the characters you can 2190possibly think of and more. There are several ways of representing these 2191characters, and the one Perl uses is called UTF8. UTF8 uses 2192a variable number of bytes to represent a character, instead of just 2193one. You can learn more about Unicode at http://www.unicode.org/ 2194 2195=head2 How can I recognise a UTF8 string? 2196 2197You can't. This is because UTF8 data is stored in bytes just like 2198non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types) 2199capital E with a grave accent, is represented by the two bytes 2200C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 2201has that byte sequence as well. So you can't tell just by looking - this 2202is what makes Unicode input an interesting problem. 2203 2204The API function C<is_utf8_string> can help; it'll tell you if a string 2205contains only valid UTF8 characters. However, it can't do the work for 2206you. On a character-by-character basis, C<is_utf8_char> will tell you 2207whether the current character in a string is valid UTF8. 2208 2209=head2 How does UTF8 represent Unicode characters? 2210 2211As mentioned above, UTF8 uses a variable number of bytes to store a 2212character. Characters with values 1...128 are stored in one byte, just 2213like good ol' ASCII. Character 129 is stored as C<v194.129>; this 2214continues up to character 191, which is C<v194.191>. Now we've run out of 2215bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And 2216so it goes on, moving to three bytes at character 2048. 2217 2218Assuming you know you're dealing with a UTF8 string, you can find out 2219how long the first character in it is with the C<UTF8SKIP> macro: 2220 2221 char *utf = "\305\233\340\240\201"; 2222 I32 len; 2223 2224 len = UTF8SKIP(utf); /* len is 2 here */ 2225 utf += len; 2226 len = UTF8SKIP(utf); /* len is 3 here */ 2227 2228Another way to skip over characters in a UTF8 string is to use 2229C<utf8_hop>, which takes a string and a number of characters to skip 2230over. You're on your own about bounds checking, though, so don't use it 2231lightly. 2232 2233All bytes in a multi-byte UTF8 character will have the high bit set, so 2234you can test if you need to do something special with this character 2235like this: 2236 2237 UV uv; 2238 2239 if (utf & 0x80) 2240 /* Must treat this as UTF8 */ 2241 uv = utf8_to_uv(utf); 2242 else 2243 /* OK to treat this character as a byte */ 2244 uv = *utf; 2245 2246You can also see in that example that we use C<utf8_to_uv> to get the 2247value of the character; the inverse function C<uv_to_utf8> is available 2248for putting a UV into UTF8: 2249 2250 if (uv > 0x80) 2251 /* Must treat this as UTF8 */ 2252 utf8 = uv_to_utf8(utf8, uv); 2253 else 2254 /* OK to treat this character as a byte */ 2255 *utf8++ = uv; 2256 2257You B<must> convert characters to UVs using the above functions if 2258you're ever in a situation where you have to match UTF8 and non-UTF8 2259characters. You may not skip over UTF8 characters in this case. If you 2260do this, you'll lose the ability to match hi-bit non-UTF8 characters; 2261for instance, if your UTF8 string contains C<v196.172>, and you skip 2262that character, you can never match a C<chr(200)> in a non-UTF8 string. 2263So don't do that! 2264 2265=head2 How does Perl store UTF8 strings? 2266 2267Currently, Perl deals with Unicode strings and non-Unicode strings 2268slightly differently. If a string has been identified as being UTF-8 2269encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and 2270manipulate this flag with the following macros: 2271 2272 SvUTF8(sv) 2273 SvUTF8_on(sv) 2274 SvUTF8_off(sv) 2275 2276This flag has an important effect on Perl's treatment of the string: if 2277Unicode data is not properly distinguished, regular expressions, 2278C<length>, C<substr> and other string handling operations will have 2279undesirable results. 2280 2281The problem comes when you have, for instance, a string that isn't 2282flagged is UTF8, and contains a byte sequence that could be UTF8 - 2283especially when combining non-UTF8 and UTF8 strings. 2284 2285Never forget that the C<SVf_UTF8> flag is separate to the PV value; you 2286need be sure you don't accidentally knock it off while you're 2287manipulating SVs. More specifically, you cannot expect to do this: 2288 2289 SV *sv; 2290 SV *nsv; 2291 STRLEN len; 2292 char *p; 2293 2294 p = SvPV(sv, len); 2295 frobnicate(p); 2296 nsv = newSVpvn(p, len); 2297 2298The C<char*> string does not tell you the whole story, and you can't 2299copy or reconstruct an SV just by copying the string value. Check if the 2300old SV has the UTF8 flag set, and act accordingly: 2301 2302 p = SvPV(sv, len); 2303 frobnicate(p); 2304 nsv = newSVpvn(p, len); 2305 if (SvUTF8(sv)) 2306 SvUTF8_on(nsv); 2307 2308In fact, your C<frobnicate> function should be made aware of whether or 2309not it's dealing with UTF8 data, so that it can handle the string 2310appropriately. 2311 2312=head2 How do I convert a string to UTF8? 2313 2314If you're mixing UTF8 and non-UTF8 strings, you might find it necessary 2315to upgrade one of the strings to UTF8. If you've got an SV, the easiest 2316way to do this is: 2317 2318 sv_utf8_upgrade(sv); 2319 2320However, you must not do this, for example: 2321 2322 if (!SvUTF8(left)) 2323 sv_utf8_upgrade(left); 2324 2325If you do this in a binary operator, you will actually change one of the 2326strings that came into the operator, and, while it shouldn't be noticeable 2327by the end user, it can cause problems. 2328 2329Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its 2330string argument. This is useful for having the data available for 2331comparisons and so on, without harming the original SV. There's also 2332C<utf8_to_bytes> to go the other way, but naturally, this will fail if 2333the string contains any characters above 255 that can't be represented 2334in a single byte. 2335 2336=head2 Is there anything else I need to know? 2337 2338Not really. Just remember these things: 2339 2340=over 3 2341 2342=item * 2343 2344There's no way to tell if a string is UTF8 or not. You can tell if an SV 2345is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if 2346something should be UTF8. Treat the flag as part of the PV, even though 2347it's not - if you pass on the PV to somewhere, pass on the flag too. 2348 2349=item * 2350 2351If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value, 2352unless C<!(*s & 0x80)> in which case you can use C<*s>. 2353 2354=item * 2355 2356When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless 2357C<uv < 0x80> in which case you can use C<*s = uv>. 2358 2359=item * 2360 2361Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get 2362a new string which is UTF8 encoded. There are tricks you can use to 2363delay deciding whether you need to use a UTF8 string until you get to a 2364high character - C<HALF_UPGRADE> is one of those. 2365 2366=back 2367 2368=head1 Custom Operators 2369 2370Custom operator support is a new experimental feature that allows you to 2371define your own ops. This is primarily to allow the building of 2372interpreters for other languages in the Perl core, but it also allows 2373optimizations through the creation of "macro-ops" (ops which perform the 2374functions of multiple ops which are usually executed together, such as 2375C<gvsv, gvsv, add>.) 2376 2377This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 2378core does not "know" anything special about this op type, and so it will 2379not be involved in any optimizations. This also means that you can 2380define your custom ops to be any op structure - unary, binary, list and 2381so on - you like. 2382 2383It's important to know what custom operators won't do for you. They 2384won't let you add new syntax to Perl, directly. They won't even let you 2385add new keywords, directly. In fact, they won't change the way Perl 2386compiles a program at all. You have to do those changes yourself, after 2387Perl has compiled the program. You do this either by manipulating the op 2388tree using a C<CHECK> block and the C<B::Generate> module, or by adding 2389a custom peephole optimizer with the C<optimize> module. 2390 2391When you do this, you replace ordinary Perl ops with custom ops by 2392creating ops with the type C<OP_CUSTOM> and the C<pp_addr> of your own 2393PP function. This should be defined in XS code, and should look like 2394the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 2395takes the appropriate number of values from the stack, and you are 2396responsible for adding stack marks if necessary. 2397 2398You should also "register" your op with the Perl interpreter so that it 2399can produce sensible error and warning messages. Since it is possible to 2400have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 2401Perl uses the value of C<< o->op_ppaddr >> as a key into the 2402C<PL_custom_op_descs> and C<PL_custom_op_names> hashes. This means you 2403need to enter a name and description for your op at the appropriate 2404place in the C<PL_custom_op_names> and C<PL_custom_op_descs> hashes. 2405 2406Forthcoming versions of C<B::Generate> (version 1.0 and above) should 2407directly support the creation of custom ops by name; C<Opcodes::Custom> 2408will provide functions which make it trivial to "register" custom ops to 2409the Perl interpreter. 2410 2411=head1 AUTHORS 2412 2413Until May 1997, this document was maintained by Jeff Okamoto 2414E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 2415itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 2416 2417With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 2418Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 2419Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 2420Stephen McCamant, and Gurusamy Sarathy. 2421 2422API Listing originally by Dean Roehrich E<lt>roehrich@cray.comE<gt>. 2423 2424Modifications to autogenerate the API listing (L<perlapi>) by Benjamin 2425Stuhl. 2426 2427=head1 SEE ALSO 2428 2429perlapi(1), perlintern(1), perlxs(1), perlembed(1) 2430