1=head1 NAME 2 3perlguts - Introduction to the Perl API 4 5=head1 DESCRIPTION 6 7This document attempts to describe how to use the Perl API, as well as 8to provide some info on the basic workings of the Perl core. It is far 9from complete and probably contains many errors. Please refer any 10questions or comments to the author below. 11 12=head1 Variables 13 14=head2 Datatypes 15 16Perl has three typedefs that handle Perl's three main data types: 17 18 SV Scalar Value 19 AV Array Value 20 HV Hash Value 21 22Each typedef has specific routines that manipulate the various data types. 23 24=head2 What is an "IV"? 25 26Perl uses a special typedef IV which is a simple signed integer type that is 27guaranteed to be large enough to hold a pointer (as well as an integer). 28Additionally, there is the UV, which is simply an unsigned IV. 29 30Perl also uses two special typedefs, I32 and I16, which will always be at 31least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, 32as well.) They will usually be exactly 32 and 16 bits long, but on Crays 33they will both be 64 bits. 34 35=head2 Working with SVs 36 37An SV can be created and loaded with one command. There are five types of 38values that can be loaded: an integer value (IV), an unsigned integer 39value (UV), a double (NV), a string (PV), and another scalar (SV). 40("PV" stands for "Pointer Value". You might think that it is misnamed 41because it is described as pointing only to strings. However, it is 42possible to have it point to other things For example, it could point 43to an array of UVs. But, 44using it for non-strings requires care, as the underlying assumption of 45much of the internals is that PVs are just for strings. Often, for 46example, a trailing C<NUL> is tacked on automatically. The non-string use 47is documented only in this paragraph.) 48 49The seven routines are: 50 51 SV* newSViv(IV); 52 SV* newSVuv(UV); 53 SV* newSVnv(double); 54 SV* newSVpv(const char*, STRLEN); 55 SV* newSVpvn(const char*, STRLEN); 56 SV* newSVpvf(const char*, ...); 57 SV* newSVsv(SV*); 58 59C<STRLEN> is an integer type (Size_t, usually defined as size_t in 60F<config.h>) guaranteed to be large enough to represent the size of 61any string that perl can handle. 62 63In the unlikely case of a SV requiring more complex initialization, you 64can create an empty SV with newSV(len). If C<len> is 0 an empty SV of 65type NULL is returned, else an SV of type PV is returned with len + 1 (for 66the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases 67the SV has the undef value. 68 69 SV *sv = newSV(0); /* no storage allocated */ 70 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage 71 * allocated */ 72 73To change the value of an I<already-existing> SV, there are eight routines: 74 75 void sv_setiv(SV*, IV); 76 void sv_setuv(SV*, UV); 77 void sv_setnv(SV*, double); 78 void sv_setpv(SV*, const char*); 79 void sv_setpvn(SV*, const char*, STRLEN) 80 void sv_setpvf(SV*, const char*, ...); 81 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, 82 SV **, I32, bool *); 83 void sv_setsv(SV*, SV*); 84 85Notice that you can choose to specify the length of the string to be 86assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 87allow Perl to calculate the length by using C<sv_setpv> or by specifying 880 as the second argument to C<newSVpv>. Be warned, though, that Perl will 89determine the string's length by using C<strlen>, which depends on the 90string terminating with a C<NUL> character, and not otherwise containing 91NULs. 92 93The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 94formatted output becomes the value. 95 96C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 97either a pointer to a variable argument list or the address and length of 98an array of SVs. The last argument points to a boolean; on return, if that 99boolean is true, then locale-specific information has been used to format 100the string, and the string's contents are therefore untrustworthy (see 101L<perlsec>). This pointer may be NULL if that information is not 102important. Note that this function requires you to specify the length of 103the format. 104 105The C<sv_set*()> functions are not generic enough to operate on values 106that have "magic". See L<Magic Virtual Tables> later in this document. 107 108All SVs that contain strings should be terminated with a C<NUL> character. 109If it is not C<NUL>-terminated there is a risk of 110core dumps and corruptions from code which passes the string to C 111functions or system calls which expect a C<NUL>-terminated string. 112Perl's own functions typically add a trailing C<NUL> for this reason. 113Nevertheless, you should be very careful when you pass a string stored 114in an SV to a C function or system call. 115 116To access the actual value that an SV points to, you can use the macros: 117 118 SvIV(SV*) 119 SvUV(SV*) 120 SvNV(SV*) 121 SvPV(SV*, STRLEN len) 122 SvPV_nolen(SV*) 123 124which will automatically coerce the actual scalar type into an IV, UV, double, 125or string. 126 127In the C<SvPV> macro, the length of the string returned is placed into the 128variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do 129not care what the length of the data is, use the C<SvPV_nolen> macro. 130Historically the C<SvPV> macro with the global variable C<PL_na> has been 131used in this case. But that can be quite inefficient because C<PL_na> must 132be accessed in thread-local storage in threaded Perl. In any case, remember 133that Perl allows arbitrary strings of data that may both contain NULs and 134might not be terminated by a C<NUL>. 135 136Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), 137len);>. It might work with your 138compiler, but it won't work for everyone. 139Break this sort of statement up into separate assignments: 140 141 SV *s; 142 STRLEN len; 143 char *ptr; 144 ptr = SvPV(s, len); 145 foo(ptr, len); 146 147If you want to know if the scalar value is TRUE, you can use: 148 149 SvTRUE(SV*) 150 151Although Perl will automatically grow strings for you, if you need to force 152Perl to allocate more memory for your SV, you can use the macro 153 154 SvGROW(SV*, STRLEN newlen) 155 156which will determine if more memory needs to be allocated. If so, it will 157call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 158decrease, the allocated memory of an SV and that it does not automatically 159add space for the trailing C<NUL> byte (perl's own string functions typically do 160C<SvGROW(sv, len + 1)>). 161 162If you want to write to an existing SV's buffer and set its value to a 163string, use SvPV_force() or one of its variants to force the SV to be 164a PV. This will remove any of various types of non-stringness from 165the SV while preserving the content of the SV in the PV. This can be 166used, for example, to append data from an API function to a buffer 167without extra copying: 168 169 (void)SvPVbyte_force(sv, len); 170 s = SvGROW(sv, len + needlen + 1); 171 /* something that modifies up to needlen bytes at s+len, but 172 modifies newlen bytes 173 eg. newlen = read(fd, s + len, needlen); 174 ignoring errors for these examples 175 */ 176 s[len + newlen] = '\0'; 177 SvCUR_set(sv, len + newlen); 178 SvUTF8_off(sv); 179 SvSETMAGIC(sv); 180 181If you already have the data in memory or if you want to keep your 182code simple, you can use one of the sv_cat*() variants, such as 183sv_catpvn(). If you want to insert anywhere in the string you can use 184sv_insert() or sv_insert_flags(). 185 186If you don't need the existing content of the SV, you can avoid some 187copying with: 188 189 sv_setpvn(sv, "", 0); 190 s = SvGROW(sv, needlen + 1); 191 /* something that modifies up to needlen bytes at s, but modifies 192 newlen bytes 193 eg. newlen = read(fd, s. needlen); 194 */ 195 s[newlen] = '\0'; 196 SvCUR_set(sv, newlen); 197 SvPOK_only(sv); /* also clears SVf_UTF8 */ 198 SvSETMAGIC(sv); 199 200Again, if you already have the data in memory or want to avoid the 201complexity of the above, you can use sv_setpvn(). 202 203If you have a buffer allocated with Newx() and want to set that as the 204SV's value, you can use sv_usepvn_flags(). That has some requirements 205if you want to avoid perl re-allocating the buffer to fit the trailing 206NUL: 207 208 Newx(buf, somesize+1, char); 209 /* ... fill in buf ... */ 210 buf[somesize] = '\0'; 211 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); 212 /* buf now belongs to perl, don't release it */ 213 214If you have an SV and want to know what kind of data Perl thinks is stored 215in it, you can use the following macros to check the type of SV you have. 216 217 SvIOK(SV*) 218 SvNOK(SV*) 219 SvPOK(SV*) 220 221You can get and set the current length of the string stored in an SV with 222the following macros: 223 224 SvCUR(SV*) 225 SvCUR_set(SV*, I32 val) 226 227You can also get a pointer to the end of the string stored in the SV 228with the macro: 229 230 SvEND(SV*) 231 232But note that these last three macros are valid only if C<SvPOK()> is true. 233 234If you want to append something to the end of string stored in an C<SV*>, 235you can use the following functions: 236 237 void sv_catpv(SV*, const char*); 238 void sv_catpvn(SV*, const char*, STRLEN); 239 void sv_catpvf(SV*, const char*, ...); 240 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, 241 I32, bool); 242 void sv_catsv(SV*, SV*); 243 244The first function calculates the length of the string to be appended by 245using C<strlen>. In the second, you specify the length of the string 246yourself. The third function processes its arguments like C<sprintf> and 247appends the formatted output. The fourth function works like C<vsprintf>. 248You can specify the address and length of an array of SVs instead of the 249va_list argument. The fifth function 250extends the string stored in the first 251SV with the string stored in the second SV. It also forces the second SV 252to be interpreted as a string. 253 254The C<sv_cat*()> functions are not generic enough to operate on values that 255have "magic". See L<Magic Virtual Tables> later in this document. 256 257If you know the name of a scalar variable, you can get a pointer to its SV 258by using the following: 259 260 SV* get_sv("package::varname", 0); 261 262This returns NULL if the variable does not exist. 263 264If you want to know if this variable (or any other SV) is actually C<defined>, 265you can call: 266 267 SvOK(SV*) 268 269The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 270 271Its address can be used whenever an C<SV*> is needed. Make sure that 272you don't try to compare a random sv with C<&PL_sv_undef>. For example 273when interfacing Perl code, it'll work correctly for: 274 275 foo(undef); 276 277But won't work when called as: 278 279 $x = undef; 280 foo($x); 281 282So to repeat always use SvOK() to check whether an sv is defined. 283 284Also you have to be careful when using C<&PL_sv_undef> as a value in 285AVs or HVs (see L<AVs, HVs and undefined values>). 286 287There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 288boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 289addresses can be used whenever an C<SV*> is needed. 290 291Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 292Take this code: 293 294 SV* sv = (SV*) 0; 295 if (I-am-to-return-a-real-value) { 296 sv = sv_2mortal(newSViv(42)); 297 } 298 sv_setsv(ST(0), sv); 299 300This code tries to return a new SV (which contains the value 42) if it should 301return a real value, or undef otherwise. Instead it has returned a NULL 302pointer which, somewhere down the line, will cause a segmentation violation, 303bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 304first line and all will be well. 305 306To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 307call is not necessary (see L<Reference Counts and Mortality>). 308 309=head2 Offsets 310 311Perl provides the function C<sv_chop> to efficiently remove characters 312from the beginning of a string; you give it an SV and a pointer to 313somewhere inside the PV, and it discards everything before the 314pointer. The efficiency comes by means of a little hack: instead of 315actually removing the characters, C<sv_chop> sets the flag C<OOK> 316(offset OK) to signal to other functions that the offset hack is in 317effect, and it moves the PV pointer (called C<SvPVX>) forward 318by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> 319accordingly. (A portion of the space between the old and new PV 320pointers is used to store the count of chopped bytes.) 321 322Hence, at this point, the start of the buffer that we allocated lives 323at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 324into the middle of this allocated storage. 325 326This is best demonstrated by example: 327 328 % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' 329 SV = PVIV(0x8128450) at 0x81340f0 330 REFCNT = 1 331 FLAGS = (POK,OOK,pPOK) 332 IV = 1 (OFFSET) 333 PV = 0x8135781 ( "1" . ) "2345"\0 334 CUR = 4 335 LEN = 5 336 337Here the number of bytes chopped off (1) is put into IV, and 338C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The 339portion of the string between the "real" and the "fake" beginnings is 340shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 341the fake beginning, not the real one. 342 343Something similar to the offset hack is performed on AVs to enable 344efficient shifting and splicing off the beginning of the array; while 345C<AvARRAY> points to the first element in the array that is visible from 346Perl, C<AvALLOC> points to the real start of the C array. These are 347usually the same, but a C<shift> operation can be carried out by 348increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. 349Again, the location of the real start of the C array only comes into 350play when freeing the array. See C<av_shift> in F<av.c>. 351 352=head2 What's Really Stored in an SV? 353 354Recall that the usual method of determining the type of scalar you have is 355to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 356usually these macros will always return TRUE and calling the C<Sv*V> 357macros will do the appropriate conversion of string to integer/double or 358integer/double to string. 359 360If you I<really> need to know if you have an integer, double, or string 361pointer in an SV, you can use the following three macros instead: 362 363 SvIOKp(SV*) 364 SvNOKp(SV*) 365 SvPOKp(SV*) 366 367These will tell you if you truly have an integer, double, or string pointer 368stored in your SV. The "p" stands for private. 369 370There are various ways in which the private and public flags may differ. 371For example, in perl 5.16 and earlier a tied SV may have a valid 372underlying value in the IV slot (so SvIOKp is true), but the data 373should be accessed via the FETCH routine rather than directly, 374so SvIOK is false. (In perl 5.18 onwards, tied scalars use 375the flags the same way as untied scalars.) Another is when 376numeric conversion has occurred and precision has been lost: only the 377private flag is set on 'lossy' values. So when an NV is converted to an 378IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 379 380In general, though, it's best to use the C<Sv*V> macros. 381 382=head2 Working with AVs 383 384There are two ways to create and load an AV. The first method creates an 385empty AV: 386 387 AV* newAV(); 388 389The second method both creates the AV and initially populates it with SVs: 390 391 AV* av_make(SSize_t num, SV **ptr); 392 393The second argument points to an array containing C<num> C<SV*>'s. Once the 394AV has been created, the SVs can be destroyed, if so desired. 395 396Once the AV has been created, the following operations are possible on it: 397 398 void av_push(AV*, SV*); 399 SV* av_pop(AV*); 400 SV* av_shift(AV*); 401 void av_unshift(AV*, SSize_t num); 402 403These should be familiar operations, with the exception of C<av_unshift>. 404This routine adds C<num> elements at the front of the array with the C<undef> 405value. You must then use C<av_store> (described below) to assign values 406to these new elements. 407 408Here are some other functions: 409 410 SSize_t av_top_index(AV*); 411 SV** av_fetch(AV*, SSize_t key, I32 lval); 412 SV** av_store(AV*, SSize_t key, SV* val); 413 414The C<av_top_index> function returns the highest index value in an array (just 415like $#array in Perl). If the array is empty, -1 is returned. The 416C<av_fetch> function returns the value at index C<key>, but if C<lval> 417is non-zero, then C<av_fetch> will store an undef value at that index. 418The C<av_store> function stores the value C<val> at index C<key>, and does 419not increment the reference count of C<val>. Thus the caller is responsible 420for taking care of that, and if C<av_store> returns NULL, the caller will 421have to decrement the reference count to avoid a memory leak. Note that 422C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 423return value. 424 425A few more: 426 427 void av_clear(AV*); 428 void av_undef(AV*); 429 void av_extend(AV*, SSize_t key); 430 431The C<av_clear> function deletes all the elements in the AV* array, but 432does not actually delete the array itself. The C<av_undef> function will 433delete all the elements in the array plus the array itself. The 434C<av_extend> function extends the array so that it contains at least C<key+1> 435elements. If C<key+1> is less than the currently allocated length of the array, 436then nothing is done. 437 438If you know the name of an array variable, you can get a pointer to its AV 439by using the following: 440 441 AV* get_av("package::varname", 0); 442 443This returns NULL if the variable does not exist. 444 445See L<Understanding the Magic of Tied Hashes and Arrays> for more 446information on how to use the array access functions on tied arrays. 447 448=head2 Working with HVs 449 450To create an HV, you use the following routine: 451 452 HV* newHV(); 453 454Once the HV has been created, the following operations are possible on it: 455 456 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 457 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 458 459The C<klen> parameter is the length of the key being passed in (Note that 460you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 461length of the key). The C<val> argument contains the SV pointer to the 462scalar being stored, and C<hash> is the precomputed hash value (zero if 463you want C<hv_store> to calculate it for you). The C<lval> parameter 464indicates whether this fetch is actually a part of a store operation, in 465which case a new undefined value will be added to the HV with the supplied 466key and C<hv_fetch> will return as if the value had already existed. 467 468Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 469C<SV*>. To access the scalar value, you must first dereference the return 470value. However, you should check to make sure that the return value is 471not NULL before dereferencing it. 472 473The first of these two functions checks if a hash table entry exists, and the 474second deletes it. 475 476 bool hv_exists(HV*, const char* key, U32 klen); 477 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 478 479If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 480create and return a mortal copy of the deleted value. 481 482And more miscellaneous functions: 483 484 void hv_clear(HV*); 485 void hv_undef(HV*); 486 487Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 488table but does not actually delete the hash table. The C<hv_undef> deletes 489both the entries and the hash table itself. 490 491Perl keeps the actual data in a linked list of structures with a typedef of HE. 492These contain the actual key and value pointers (plus extra administrative 493overhead). The key is a string pointer; the value is an C<SV*>. However, 494once you have an C<HE*>, to get the actual key and value, use the routines 495specified below. 496 497 I32 hv_iterinit(HV*); 498 /* Prepares starting point to traverse hash table */ 499 HE* hv_iternext(HV*); 500 /* Get the next entry, and return a pointer to a 501 structure that has both the key and value */ 502 char* hv_iterkey(HE* entry, I32* retlen); 503 /* Get the key from an HE structure and also return 504 the length of the key string */ 505 SV* hv_iterval(HV*, HE* entry); 506 /* Return an SV pointer to the value of the HE 507 structure */ 508 SV* hv_iternextsv(HV*, char** key, I32* retlen); 509 /* This convenience routine combines hv_iternext, 510 hv_iterkey, and hv_iterval. The key and retlen 511 arguments are return values for the key and its 512 length. The value is returned in the SV* argument */ 513 514If you know the name of a hash variable, you can get a pointer to its HV 515by using the following: 516 517 HV* get_hv("package::varname", 0); 518 519This returns NULL if the variable does not exist. 520 521The hash algorithm is defined in the C<PERL_HASH> macro: 522 523 PERL_HASH(hash, key, klen) 524 525The exact implementation of this macro varies by architecture and version 526of perl, and the return value may change per invocation, so the value 527is only valid for the duration of a single perl process. 528 529See L<Understanding the Magic of Tied Hashes and Arrays> for more 530information on how to use the hash access functions on tied hashes. 531 532=head2 Hash API Extensions 533 534Beginning with version 5.004, the following functions are also supported: 535 536 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 537 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 538 539 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 540 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 541 542 SV* hv_iterkeysv (HE* entry); 543 544Note that these functions take C<SV*> keys, which simplifies writing 545of extension code that deals with hash structures. These functions 546also allow passing of C<SV*> keys to C<tie> functions without forcing 547you to stringify the keys (unlike the previous set of functions). 548 549They also return and accept whole hash entries (C<HE*>), making their 550use more efficient (since the hash number for a particular string 551doesn't have to be recomputed every time). See L<perlapi> for detailed 552descriptions. 553 554The following macros must always be used to access the contents of hash 555entries. Note that the arguments to these macros must be simple 556variables, since they may get evaluated more than once. See 557L<perlapi> for detailed descriptions of these macros. 558 559 HePV(HE* he, STRLEN len) 560 HeVAL(HE* he) 561 HeHASH(HE* he) 562 HeSVKEY(HE* he) 563 HeSVKEY_force(HE* he) 564 HeSVKEY_set(HE* he, SV* sv) 565 566These two lower level macros are defined, but must only be used when 567dealing with keys that are not C<SV*>s: 568 569 HeKEY(HE* he) 570 HeKLEN(HE* he) 571 572Note that both C<hv_store> and C<hv_store_ent> do not increment the 573reference count of the stored C<val>, which is the caller's responsibility. 574If these functions return a NULL value, the caller will usually have to 575decrement the reference count of C<val> to avoid a memory leak. 576 577=head2 AVs, HVs and undefined values 578 579Sometimes you have to store undefined values in AVs or HVs. Although 580this may be a rare case, it can be tricky. That's because you're 581used to using C<&PL_sv_undef> if you need an undefined SV. 582 583For example, intuition tells you that this XS code: 584 585 AV *av = newAV(); 586 av_store( av, 0, &PL_sv_undef ); 587 588is equivalent to this Perl code: 589 590 my @av; 591 $av[0] = undef; 592 593Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker 594for indicating that an array element has not yet been initialized. 595Thus, C<exists $av[0]> would be true for the above Perl code, but 596false for the array generated by the XS code. In perl 5.20, storing 597&PL_sv_undef will create a read-only element, because the scalar 598&PL_sv_undef itself is stored, not a copy. 599 600Similar problems can occur when storing C<&PL_sv_undef> in HVs: 601 602 hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 603 604This will indeed make the value C<undef>, but if you try to modify 605the value of C<key>, you'll get the following error: 606 607 Modification of non-creatable hash value attempted 608 609In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 610in restricted hashes. This caused such hash entries not to appear 611when iterating over the hash or when checking for the keys 612with the C<hv_exists> function. 613 614You can run into similar problems when you store C<&PL_sv_yes> or 615C<&PL_sv_no> into AVs or HVs. Trying to modify such elements 616will give you the following error: 617 618 Modification of a read-only value attempted 619 620To make a long story short, you can use the special variables 621C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and 622HVs, but you have to make sure you know what you're doing. 623 624Generally, if you want to store an undefined value in an AV 625or HV, you should not use C<&PL_sv_undef>, but rather create a 626new undefined value using the C<newSV> function, for example: 627 628 av_store( av, 42, newSV(0) ); 629 hv_store( hv, "foo", 3, newSV(0), 0 ); 630 631=head2 References 632 633References are a special type of scalar that point to other data types 634(including other references). 635 636To create a reference, use either of the following functions: 637 638 SV* newRV_inc((SV*) thing); 639 SV* newRV_noinc((SV*) thing); 640 641The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 642functions are identical except that C<newRV_inc> increments the reference 643count of the C<thing>, while C<newRV_noinc> does not. For historical 644reasons, C<newRV> is a synonym for C<newRV_inc>. 645 646Once you have a reference, you can use the following macro to dereference 647the reference: 648 649 SvRV(SV*) 650 651then call the appropriate routines, casting the returned C<SV*> to either an 652C<AV*> or C<HV*>, if required. 653 654To determine if an SV is a reference, you can use the following macro: 655 656 SvROK(SV*) 657 658To discover what type of value the reference refers to, use the following 659macro and then check the return value. 660 661 SvTYPE(SvRV(SV*)) 662 663The most useful types that will be returned are: 664 665 < SVt_PVAV Scalar 666 SVt_PVAV Array 667 SVt_PVHV Hash 668 SVt_PVCV Code 669 SVt_PVGV Glob (possibly a file handle) 670 671See L<perlapi/svtype> for more details. 672 673=head2 Blessed References and Class Objects 674 675References are also used to support object-oriented programming. In perl's 676OO lexicon, an object is simply a reference that has been blessed into a 677package (or class). Once blessed, the programmer may now use the reference 678to access the various methods in the class. 679 680A reference can be blessed into a package with the following function: 681 682 SV* sv_bless(SV* sv, HV* stash); 683 684The C<sv> argument must be a reference value. The C<stash> argument 685specifies which class the reference will belong to. See 686L<Stashes and Globs> for information on converting class names into stashes. 687 688/* Still under construction */ 689 690The following function upgrades rv to reference if not already one. 691Creates a new SV for rv to point to. If C<classname> is non-null, the SV 692is blessed into the specified class. SV is returned. 693 694 SV* newSVrv(SV* rv, const char* classname); 695 696The following three functions copy integer, unsigned integer or double 697into an SV whose reference is C<rv>. SV is blessed if C<classname> is 698non-null. 699 700 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 701 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 702 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 703 704The following function copies the pointer value (I<the address, not the 705string!>) into an SV whose reference is rv. SV is blessed if C<classname> 706is non-null. 707 708 SV* sv_setref_pv(SV* rv, const char* classname, void* pv); 709 710The following function copies a string into an SV whose reference is C<rv>. 711Set length to 0 to let Perl calculate the string length. SV is blessed if 712C<classname> is non-null. 713 714 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, 715 STRLEN length); 716 717The following function tests whether the SV is blessed into the specified 718class. It does not check inheritance relationships. 719 720 int sv_isa(SV* sv, const char* name); 721 722The following function tests whether the SV is a reference to a blessed object. 723 724 int sv_isobject(SV* sv); 725 726The following function tests whether the SV is derived from the specified 727class. SV can be either a reference to a blessed object or a string 728containing a class name. This is the function implementing the 729C<UNIVERSAL::isa> functionality. 730 731 bool sv_derived_from(SV* sv, const char* name); 732 733To check if you've got an object derived from a specific class you have 734to write: 735 736 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 737 738=head2 Creating New Variables 739 740To create a new Perl variable with an undef value which can be accessed from 741your Perl script, use the following routines, depending on the variable type. 742 743 SV* get_sv("package::varname", GV_ADD); 744 AV* get_av("package::varname", GV_ADD); 745 HV* get_hv("package::varname", GV_ADD); 746 747Notice the use of GV_ADD as the second parameter. The new variable can now 748be set, using the routines appropriate to the data type. 749 750There are additional macros whose values may be bitwise OR'ed with the 751C<GV_ADD> argument to enable certain extra features. Those bits are: 752 753=over 754 755=item GV_ADDMULTI 756 757Marks the variable as multiply defined, thus preventing the: 758 759 Name <varname> used only once: possible typo 760 761warning. 762 763=item GV_ADDWARN 764 765Issues the warning: 766 767 Had to create <varname> unexpectedly 768 769if the variable did not exist before the function was called. 770 771=back 772 773If you do not specify a package name, the variable is created in the current 774package. 775 776=head2 Reference Counts and Mortality 777 778Perl uses a reference count-driven garbage collection mechanism. SVs, 779AVs, or HVs (xV for short in the following) start their life with a 780reference count of 1. If the reference count of an xV ever drops to 0, 781then it will be destroyed and its memory made available for reuse. 782 783This normally doesn't happen at the Perl level unless a variable is 784undef'ed or the last variable holding a reference to it is changed or 785overwritten. At the internal level, however, reference counts can be 786manipulated with the following macros: 787 788 int SvREFCNT(SV* sv); 789 SV* SvREFCNT_inc(SV* sv); 790 void SvREFCNT_dec(SV* sv); 791 792However, there is one other function which manipulates the reference 793count of its argument. The C<newRV_inc> function, you will recall, 794creates a reference to the specified argument. As a side effect, 795it increments the argument's reference count. If this is not what 796you want, use C<newRV_noinc> instead. 797 798For example, imagine you want to return a reference from an XSUB function. 799Inside the XSUB routine, you create an SV which initially has a reference 800count of one. Then you call C<newRV_inc>, passing it the just-created SV. 801This returns the reference as a new SV, but the reference count of the 802SV you passed to C<newRV_inc> has been incremented to two. Now you 803return the reference from the XSUB routine and forget about the SV. 804But Perl hasn't! Whenever the returned reference is destroyed, the 805reference count of the original SV is decreased to one and nothing happens. 806The SV will hang around without any way to access it until Perl itself 807terminates. This is a memory leak. 808 809The correct procedure, then, is to use C<newRV_noinc> instead of 810C<newRV_inc>. Then, if and when the last reference is destroyed, 811the reference count of the SV will go to zero and it will be destroyed, 812stopping any memory leak. 813 814There are some convenience functions available that can help with the 815destruction of xVs. These functions introduce the concept of "mortality". 816An xV that is mortal has had its reference count marked to be decremented, 817but not actually decremented, until "a short time later". Generally the 818term "short time later" means a single Perl statement, such as a call to 819an XSUB function. The actual determinant for when mortal xVs have their 820reference count decremented depends on two macros, SAVETMPS and FREETMPS. 821See L<perlcall> and L<perlxs> for more details on these macros. 822 823"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. 824However, if you mortalize a variable twice, the reference count will 825later be decremented twice. 826 827"Mortal" SVs are mainly used for SVs that are placed on perl's stack. 828For example an SV which is created just to pass a number to a called sub 829is made mortal to have it cleaned up automatically when it's popped off 830the stack. Similarly, results returned by XSUBs (which are pushed on the 831stack) are often made mortal. 832 833To create a mortal variable, use the functions: 834 835 SV* sv_newmortal() 836 SV* sv_2mortal(SV*) 837 SV* sv_mortalcopy(SV*) 838 839The first call creates a mortal SV (with no value), the second converts an existing 840SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the 841third creates a mortal copy of an existing SV. 842Because C<sv_newmortal> gives the new SV no value, it must normally be given one 843via C<sv_setpv>, C<sv_setiv>, etc. : 844 845 SV *tmp = sv_newmortal(); 846 sv_setiv(tmp, an_integer); 847 848As that is multiple C statements it is quite common so see this idiom instead: 849 850 SV *tmp = sv_2mortal(newSViv(an_integer)); 851 852 853You should be careful about creating mortal variables. Strange things 854can happen if you make the same value mortal within multiple contexts, 855or if you make a variable mortal multiple 856times. Thinking of "Mortalization" 857as deferred C<SvREFCNT_dec> should help to minimize such problems. 858For example if you are passing an SV which you I<know> has a high enough REFCNT 859to survive its use on the stack you need not do any mortalization. 860If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or 861making a C<sv_mortalcopy> is safer. 862 863The mortal routines are not just for SVs; AVs and HVs can be 864made mortal by passing their address (type-casted to C<SV*>) to the 865C<sv_2mortal> or C<sv_mortalcopy> routines. 866 867=head2 Stashes and Globs 868 869A B<stash> is a hash that contains all variables that are defined 870within a package. Each key of the stash is a symbol 871name (shared by all the different types of objects that have the same 872name), and each value in the hash table is a GV (Glob Value). This GV 873in turn contains references to the various objects of that name, 874including (but not limited to) the following: 875 876 Scalar Value 877 Array Value 878 Hash Value 879 I/O Handle 880 Format 881 Subroutine 882 883There is a single stash called C<PL_defstash> that holds the items that exist 884in the C<main> package. To get at the items in other packages, append the 885string "::" to the package name. The items in the C<Foo> package are in 886the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 887in the stash C<Baz::> in C<Bar::>'s stash. 888 889To get the stash pointer for a particular package, use the function: 890 891 HV* gv_stashpv(const char* name, I32 flags) 892 HV* gv_stashsv(SV*, I32 flags) 893 894The first function takes a literal string, the second uses the string stored 895in the SV. Remember that a stash is just a hash table, so you get back an 896C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. 897 898The name that C<gv_stash*v> wants is the name of the package whose symbol table 899you want. The default package is called C<main>. If you have multiply nested 900packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 901language itself. 902 903Alternately, if you have an SV that is a blessed reference, you can find 904out the stash pointer by using: 905 906 HV* SvSTASH(SvRV(SV*)); 907 908then use the following to get the package name itself: 909 910 char* HvNAME(HV* stash); 911 912If you need to bless or re-bless an object you can use the following 913function: 914 915 SV* sv_bless(SV*, HV* stash) 916 917where the first argument, an C<SV*>, must be a reference, and the second 918argument is a stash. The returned C<SV*> can now be used in the same way 919as any other SV. 920 921For more information on references and blessings, consult L<perlref>. 922 923=head2 Double-Typed SVs 924 925Scalar variables normally contain only one type of value, an integer, 926double, pointer, or reference. Perl will automatically convert the 927actual scalar data from the stored type into the requested type. 928 929Some scalar variables contain more than one type of scalar data. For 930example, the variable C<$!> contains either the numeric value of C<errno> 931or its string equivalent from either C<strerror> or C<sys_errlist[]>. 932 933To force multiple data values into an SV, you must do two things: use the 934C<sv_set*v> routines to add the additional scalar type, then set a flag 935so that Perl will believe it contains more than one type of data. The 936four macros to set the flags are: 937 938 SvIOK_on 939 SvNOK_on 940 SvPOK_on 941 SvROK_on 942 943The particular macro you must use depends on which C<sv_set*v> routine 944you called first. This is because every C<sv_set*v> routine turns on 945only the bit for the particular type of data being set, and turns off 946all the rest. 947 948For example, to create a new Perl variable called "dberror" that contains 949both the numeric and descriptive string error values, you could use the 950following code: 951 952 extern int dberror; 953 extern char *dberror_list; 954 955 SV* sv = get_sv("dberror", GV_ADD); 956 sv_setiv(sv, (IV) dberror); 957 sv_setpv(sv, dberror_list[dberror]); 958 SvIOK_on(sv); 959 960If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 961macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 962 963=head2 Read-Only Values 964 965In Perl 5.16 and earlier, copy-on-write (see the next section) shared a 966flag bit with read-only scalars. So the only way to test whether 967C<sv_setsv>, etc., will raise a "Modification of a read-only value" error 968in those versions is: 969 970 SvREADONLY(sv) && !SvIsCOW(sv) 971 972Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, 973and, under 5.20, copy-on-write scalars can also be read-only, so the above 974check is incorrect. You just want: 975 976 SvREADONLY(sv) 977 978If you need to do this check often, define your own macro like this: 979 980 #if PERL_VERSION >= 18 981 # define SvTRULYREADONLY(sv) SvREADONLY(sv) 982 #else 983 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) 984 #endif 985 986=head2 Copy on Write 987 988Perl implements a copy-on-write (COW) mechanism for scalars, in which 989string copies are not immediately made when requested, but are deferred 990until made necessary by one or the other scalar changing. This is mostly 991transparent, but one must take care not to modify string buffers that are 992shared by multiple SVs. 993 994You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. 995 996You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). 997 998If you want to make the SV drop its string buffer, use 999C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply 1000C<sv_setsv(sv, NULL)>. 1001 1002All of these functions will croak on read-only scalars (see the previous 1003section for more on those). 1004 1005To test that your code is behaving correctly and not modifying COW buffers, 1006on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with 1007C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations 1008into crashes. You will find it to be marvellously slow, so you may want to 1009skip perl's own tests. 1010 1011=head2 Magic Variables 1012 1013[This section still under construction. Ignore everything here. Post no 1014bills. Everything not permitted is forbidden.] 1015 1016Any SV may be magical, that is, it has special features that a normal 1017SV does not have. These features are stored in the SV structure in a 1018linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 1019 1020 struct magic { 1021 MAGIC* mg_moremagic; 1022 MGVTBL* mg_virtual; 1023 U16 mg_private; 1024 char mg_type; 1025 U8 mg_flags; 1026 I32 mg_len; 1027 SV* mg_obj; 1028 char* mg_ptr; 1029 }; 1030 1031Note this is current as of patchlevel 0, and could change at any time. 1032 1033=head2 Assigning Magic 1034 1035Perl adds magic to an SV using the sv_magic function: 1036 1037 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 1038 1039The C<sv> argument is a pointer to the SV that is to acquire a new magical 1040feature. 1041 1042If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 1043convert C<sv> to type C<SVt_PVMG>. 1044Perl then continues by adding new magic 1045to the beginning of the linked list of magical features. Any prior entry 1046of the same type of magic is deleted. Note that this can be overridden, 1047and multiple instances of the same type of magic can be associated with an 1048SV. 1049 1050The C<name> and C<namlen> arguments are used to associate a string with 1051the magic, typically the name of a variable. C<namlen> is stored in the 1052C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of 1053C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on 1054whether C<namlen> is greater than zero or equal to zero respectively. As a 1055special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed 1056to contain an C<SV*> and is stored as-is with its REFCNT incremented. 1057 1058The sv_magic function uses C<how> to determine which, if any, predefined 1059"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 1060See the L<Magic Virtual Tables> section below. The C<how> argument is also 1061stored in the C<mg_type> field. The value of 1062C<how> should be chosen from the set of macros 1063C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 1064these macros were added, Perl internals used to directly use character 1065literals, so you may occasionally come across old code or documentation 1066referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 1067 1068The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 1069structure. If it is not the same as the C<sv> argument, the reference 1070count of the C<obj> object is incremented. If it is the same, or if 1071the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer, 1072then C<obj> is merely stored, without the reference count being incremented. 1073 1074See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic 1075to an SV. 1076 1077There is also a function to add magic to an C<HV>: 1078 1079 void hv_magic(HV *hv, GV *gv, int how); 1080 1081This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 1082 1083To remove the magic from an SV, call the function sv_unmagic: 1084 1085 int sv_unmagic(SV *sv, int type); 1086 1087The C<type> argument should be equal to the C<how> value when the C<SV> 1088was initially made magical. 1089 1090However, note that C<sv_unmagic> removes all magic of a certain C<type> from the 1091C<SV>. If you want to remove only certain 1092magic of a C<type> based on the magic 1093virtual table, use C<sv_unmagicext> instead: 1094 1095 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); 1096 1097=head2 Magic Virtual Tables 1098 1099The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 1100C<MGVTBL>, which is a structure of function pointers and stands for 1101"Magic Virtual Table" to handle the various operations that might be 1102applied to that variable. 1103 1104The C<MGVTBL> has five (or sometimes eight) pointers to the following 1105routine types: 1106 1107 int (*svt_get)(SV* sv, MAGIC* mg); 1108 int (*svt_set)(SV* sv, MAGIC* mg); 1109 U32 (*svt_len)(SV* sv, MAGIC* mg); 1110 int (*svt_clear)(SV* sv, MAGIC* mg); 1111 int (*svt_free)(SV* sv, MAGIC* mg); 1112 1113 int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv, 1114 const char *name, I32 namlen); 1115 int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param); 1116 int (*svt_local)(SV *nsv, MAGIC *mg); 1117 1118 1119This MGVTBL structure is set at compile-time in F<perl.h> and there are 1120currently 32 types. These different structures contain pointers to various 1121routines that perform additional actions depending on which function is 1122being called. 1123 1124 Function pointer Action taken 1125 ---------------- ------------ 1126 svt_get Do something before the value of the SV is 1127 retrieved. 1128 svt_set Do something after the SV is assigned a value. 1129 svt_len Report on the SV's length. 1130 svt_clear Clear something the SV represents. 1131 svt_free Free any extra storage associated with the SV. 1132 1133 svt_copy copy tied variable magic to a tied element 1134 svt_dup duplicate a magic structure during thread cloning 1135 svt_local copy magic to local value during 'local' 1136 1137For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 1138to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 1139 1140 { magic_get, magic_set, magic_len, 0, 0 } 1141 1142Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 1143if a get operation is being performed, the routine C<magic_get> is 1144called. All the various routines for the various magical types begin 1145with C<magic_>. NOTE: the magic routines are not considered part of 1146the Perl API, and may not be exported by the Perl library. 1147 1148The last three slots are a recent addition, and for source code 1149compatibility they are only checked for if one of the three flags 1150MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. 1151This means that most code can continue declaring 1152a vtable as a 5-element value. These three are 1153currently used exclusively by the threading code, and are highly subject 1154to change. 1155 1156The current kinds of Magic Virtual Tables are: 1157 1158=for comment 1159This table is generated by regen/mg_vtable.pl. Any changes made here 1160will be lost. 1161 1162=for mg_vtable.pl begin 1163 1164 mg_type 1165 (old-style char and macro) MGVTBL Type of magic 1166 -------------------------- ------ ------------- 1167 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 1168 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1169 % PERL_MAGIC_rhash (none) extra data for restricted 1170 hashes 1171 & PERL_MAGIC_proto (none) my sub prototype CV 1172 . PERL_MAGIC_pos vtbl_pos pos() lvalue 1173 : PERL_MAGIC_symtab (none) extra data for symbol 1174 tables 1175 < PERL_MAGIC_backref vtbl_backref for weak ref data 1176 @ PERL_MAGIC_arylen_p (none) to move arylen out of XPVAV 1177 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore 1178 (fast string search) 1179 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table 1180 (AMT) on stash 1181 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 1182 (@+ and @- vars) 1183 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1184 element 1185 E PERL_MAGIC_env vtbl_env %ENV hash 1186 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1187 f PERL_MAGIC_fm vtbl_regexp Formline 1188 ('compiled' format) 1189 g PERL_MAGIC_regex_global vtbl_mglob m//g target 1190 H PERL_MAGIC_hints vtbl_hints %^H hash 1191 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element 1192 I PERL_MAGIC_isa vtbl_isa @ISA array 1193 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1194 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1195 L PERL_MAGIC_dbfile (none) Debugger %_<filename 1196 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename 1197 element 1198 N PERL_MAGIC_shared (none) Shared between threads 1199 n PERL_MAGIC_shared_scalar (none) Shared between threads 1200 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation 1201 P PERL_MAGIC_tied vtbl_pack Tied array or hash 1202 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1203 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1204 r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex 1205 S PERL_MAGIC_sig (none) %SIG hash 1206 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1207 t PERL_MAGIC_taint vtbl_taint Taintedness 1208 U PERL_MAGIC_uvar vtbl_uvar Available for use by 1209 extensions 1210 u PERL_MAGIC_uvar_elem (none) Reserved for use by 1211 extensions 1212 V PERL_MAGIC_vstring (none) SV was vstring literal 1213 v PERL_MAGIC_vec vtbl_vec vec() lvalue 1214 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information 1215 x PERL_MAGIC_substr vtbl_substr substr() lvalue 1216 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1217 variable / smart parameter 1218 vivification 1219 ] PERL_MAGIC_checkcall vtbl_checkcall inlining/mutation of call 1220 to this CV 1221 ~ PERL_MAGIC_ext (none) Available for use by 1222 extensions 1223 1224=for mg_vtable.pl end 1225 1226When an uppercase and lowercase letter both exist in the table, then the 1227uppercase letter is typically used to represent some kind of composite type 1228(a list or a hash), and the lowercase letter is used to represent an element 1229of that composite type. Some internals code makes use of this case 1230relationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1231 1232The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined 1233specifically for use by extensions and will not be used by perl itself. 1234Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information 1235to variables (typically objects). This is especially useful because 1236there is no way for normal perl code to corrupt this private information 1237(unlike using extra elements of a hash object). 1238 1239Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1240C function any time a scalar's value is used or changed. The C<MAGIC>'s 1241C<mg_ptr> field points to a C<ufuncs> structure: 1242 1243 struct ufuncs { 1244 I32 (*uf_val)(pTHX_ IV, SV*); 1245 I32 (*uf_set)(pTHX_ IV, SV*); 1246 IV uf_index; 1247 }; 1248 1249When the SV is read from or written to, the C<uf_val> or C<uf_set> 1250function will be called with C<uf_index> as the first arg and a pointer to 1251the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1252magic is shown below. Note that the ufuncs structure is copied by 1253sv_magic, so you can safely allocate it on the stack. 1254 1255 void 1256 Umagic(sv) 1257 SV *sv; 1258 PREINIT: 1259 struct ufuncs uf; 1260 CODE: 1261 uf.uf_val = &my_get_fn; 1262 uf.uf_set = &my_set_fn; 1263 uf.uf_index = 0; 1264 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1265 1266Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. 1267 1268For hashes there is a specialized hook that gives control over hash 1269keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic 1270if the "set" function in the C<ufuncs> structure is NULL. The hook 1271is activated whenever the hash is accessed with a key specified as 1272an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, 1273C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string 1274through the functions without the C<..._ent> suffix circumvents the 1275hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. 1276 1277Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1278or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1279extra care to avoid conflict. Typically only using the magic on 1280objects blessed into the same class as the extension is sufficient. 1281For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an 1282C<MGVTBL>, even if all its fields will be C<0>, so that individual 1283C<MAGIC> pointers can be identified as a particular kind of magic 1284using their magic virtual table. C<mg_findext> provides an easy way 1285to do that: 1286 1287 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; 1288 1289 MAGIC *mg; 1290 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { 1291 /* this is really ours, not another module's PERL_MAGIC_ext */ 1292 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; 1293 ... 1294 } 1295 1296Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1297earlier do B<not> invoke 'set' magic on their targets. This must 1298be done by the user either by calling the C<SvSETMAGIC()> macro after 1299calling these functions, or by using one of the C<sv_set*_mg()> or 1300C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1301C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1302obtained from external sources in functions that don't handle magic. 1303See L<perlapi> for a description of these functions. 1304For example, calls to the C<sv_cat*()> functions typically need to be 1305followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1306since their implementation handles 'get' magic. 1307 1308=head2 Finding Magic 1309 1310 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that 1311 * type */ 1312 1313This routine returns a pointer to a C<MAGIC> structure stored in the SV. 1314If the SV does not have that magical 1315feature, C<NULL> is returned. If the 1316SV has multiple instances of that magical feature, the first one will be 1317returned. C<mg_findext> can be used 1318to find a C<MAGIC> structure of an SV 1319based on both its magic type and its magic virtual table: 1320 1321 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); 1322 1323Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type 1324SVt_PVMG, Perl may core dump. 1325 1326 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1327 1328This routine checks to see what types of magic C<sv> has. If the mg_type 1329field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1330the mg_type field is changed to be the lowercase letter. 1331 1332=head2 Understanding the Magic of Tied Hashes and Arrays 1333 1334Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1335magic type. 1336 1337WARNING: As of the 5.004 release, proper usage of the array and hash 1338access functions requires understanding a few caveats. Some 1339of these caveats are actually considered bugs in the API, to be fixed 1340in later releases, and are bracketed with [MAYCHANGE] below. If 1341you find yourself actually applying such information in this section, be 1342aware that the behavior may change in the future, umm, without warning. 1343 1344The perl tie function associates a variable with an object that implements 1345the various GET, SET, etc methods. To perform the equivalent of the perl 1346tie function from an XSUB, you must mimic this behaviour. The code below 1347carries out the necessary steps - firstly it creates a new hash, and then 1348creates a second hash which it blesses into the class which will implement 1349the tie methods. Lastly it ties the two hashes together, and returns a 1350reference to the new tied hash. Note that the code below does NOT call the 1351TIEHASH method in the MyTie class - 1352see L<Calling Perl Routines from within C Programs> for details on how 1353to do this. 1354 1355 SV* 1356 mytie() 1357 PREINIT: 1358 HV *hash; 1359 HV *stash; 1360 SV *tie; 1361 CODE: 1362 hash = newHV(); 1363 tie = newRV_noinc((SV*)newHV()); 1364 stash = gv_stashpv("MyTie", GV_ADD); 1365 sv_bless(tie, stash); 1366 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1367 RETVAL = newRV_noinc(hash); 1368 OUTPUT: 1369 RETVAL 1370 1371The C<av_store> function, when given a tied array argument, merely 1372copies the magic of the array onto the value to be "stored", using 1373C<mg_copy>. It may also return NULL, indicating that the value did not 1374actually need to be stored in the array. [MAYCHANGE] After a call to 1375C<av_store> on a tied array, the caller will usually need to call 1376C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1377TIEARRAY object. If C<av_store> did return NULL, a call to 1378C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1379leak. [/MAYCHANGE] 1380 1381The previous paragraph is applicable verbatim to tied hash access using the 1382C<hv_store> and C<hv_store_ent> functions as well. 1383 1384C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1385C<hv_fetch_ent> actually return an undefined mortal value whose magic 1386has been initialized using C<mg_copy>. Note the value so returned does not 1387need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1388need to call C<mg_get()> on the returned value in order to actually invoke 1389the perl level "FETCH" method on the underlying TIE object. Similarly, 1390you may also call C<mg_set()> on the return value after possibly assigning 1391a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1392method on the TIE object. [/MAYCHANGE] 1393 1394[MAYCHANGE] 1395In other words, the array or hash fetch/store functions don't really 1396fetch and store actual values in the case of tied arrays and hashes. They 1397merely call C<mg_copy> to attach magic to the values that were meant to be 1398"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1399do the job of invoking the TIE methods on the underlying objects. Thus 1400the magic mechanism currently implements a kind of lazy access to arrays 1401and hashes. 1402 1403Currently (as of perl version 5.004), use of the hash and array access 1404functions requires the user to be aware of whether they are operating on 1405"normal" hashes and arrays, or on their tied variants. The API may be 1406changed to provide more transparent access to both tied and normal data 1407types in future versions. 1408[/MAYCHANGE] 1409 1410You would do well to understand that the TIEARRAY and TIEHASH interfaces 1411are mere sugar to invoke some perl method calls while using the uniform hash 1412and array syntax. The use of this sugar imposes some overhead (typically 1413about two to four extra opcodes per FETCH/STORE operation, in addition to 1414the creation of all the mortal variables required to invoke the methods). 1415This overhead will be comparatively small if the TIE methods are themselves 1416substantial, but if they are only a few statements long, the overhead 1417will not be insignificant. 1418 1419=head2 Localizing changes 1420 1421Perl has a very handy construction 1422 1423 { 1424 local $var = 2; 1425 ... 1426 } 1427 1428This construction is I<approximately> equivalent to 1429 1430 { 1431 my $oldvar = $var; 1432 $var = 2; 1433 ... 1434 $var = $oldvar; 1435 } 1436 1437The biggest difference is that the first construction would 1438reinstate the initial value of $var, irrespective of how control exits 1439the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1440more efficient as well. 1441 1442There is a way to achieve a similar task from C via Perl API: create a 1443I<pseudo-block>, and arrange for some changes to be automatically 1444undone at the end of it, either explicit, or via a non-local exit (via 1445die()). A I<block>-like construct is created by a pair of 1446C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1447Such a construct may be created specially for some important localized 1448task, or an existing one (like boundaries of enclosing Perl 1449subroutine/block, or an existing pair for freeing TMPs) may be 1450used. (In the second case the overhead of additional localization must 1451be almost negligible.) Note that any XSUB is automatically enclosed in 1452an C<ENTER>/C<LEAVE> pair. 1453 1454Inside such a I<pseudo-block> the following service is available: 1455 1456=over 4 1457 1458=item C<SAVEINT(int i)> 1459 1460=item C<SAVEIV(IV i)> 1461 1462=item C<SAVEI32(I32 i)> 1463 1464=item C<SAVELONG(long i)> 1465 1466These macros arrange things to restore the value of integer variable 1467C<i> at the end of enclosing I<pseudo-block>. 1468 1469=item C<SAVESPTR(s)> 1470 1471=item C<SAVEPPTR(p)> 1472 1473These macros arrange things to restore the value of pointers C<s> and 1474C<p>. C<s> must be a pointer of a type which survives conversion to 1475C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1476and back. 1477 1478=item C<SAVEFREESV(SV *sv)> 1479 1480The refcount of C<sv> would be decremented at the end of 1481I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1482mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1483extends the lifetime of C<sv> until the beginning of the next statement, 1484C<SAVEFREESV> extends it until the end of the enclosing scope. These 1485lifetimes can be wildly different. 1486 1487Also compare C<SAVEMORTALIZESV>. 1488 1489=item C<SAVEMORTALIZESV(SV *sv)> 1490 1491Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1492scope instead of decrementing its reference count. This usually has the 1493effect of keeping C<sv> alive until the statement that called the currently 1494live scope has finished executing. 1495 1496=item C<SAVEFREEOP(OP *op)> 1497 1498The C<OP *> is op_free()ed at the end of I<pseudo-block>. 1499 1500=item C<SAVEFREEPV(p)> 1501 1502The chunk of memory which is pointed to by C<p> is Safefree()ed at the 1503end of I<pseudo-block>. 1504 1505=item C<SAVECLEARSV(SV *sv)> 1506 1507Clears a slot in the current scratchpad which corresponds to C<sv> at 1508the end of I<pseudo-block>. 1509 1510=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1511 1512The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1513string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1514short-lived storage, the corresponding string may be reallocated like 1515this: 1516 1517 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1518 1519=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1520 1521At the end of I<pseudo-block> the function C<f> is called with the 1522only argument C<p>. 1523 1524=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1525 1526At the end of I<pseudo-block> the function C<f> is called with the 1527implicit context argument (if any), and C<p>. 1528 1529=item C<SAVESTACK_POS()> 1530 1531The current offset on the Perl internal stack (cf. C<SP>) is restored 1532at the end of I<pseudo-block>. 1533 1534=back 1535 1536The following API list contains functions, thus one needs to 1537provide pointers to the modifiable data explicitly (either C pointers, 1538or Perlish C<GV *>s). Where the above macros take C<int>, a similar 1539function takes C<int *>. 1540 1541=over 4 1542 1543=item C<SV* save_scalar(GV *gv)> 1544 1545Equivalent to Perl code C<local $gv>. 1546 1547=item C<AV* save_ary(GV *gv)> 1548 1549=item C<HV* save_hash(GV *gv)> 1550 1551Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 1552 1553=item C<void save_item(SV *item)> 1554 1555Duplicates the current value of C<SV>, on the exit from the current 1556C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> 1557using the stored value. It doesn't handle magic. Use C<save_scalar> if 1558magic is affected. 1559 1560=item C<void save_list(SV **sarg, I32 maxsarg)> 1561 1562A variant of C<save_item> which takes multiple arguments via an array 1563C<sarg> of C<SV*> of length C<maxsarg>. 1564 1565=item C<SV* save_svref(SV **sptr)> 1566 1567Similar to C<save_scalar>, but will reinstate an C<SV *>. 1568 1569=item C<void save_aptr(AV **aptr)> 1570 1571=item C<void save_hptr(HV **hptr)> 1572 1573Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 1574 1575=back 1576 1577The C<Alias> module implements localization of the basic types within the 1578I<caller's scope>. People who are interested in how to localize things in 1579the containing scope should take a look there too. 1580 1581=head1 Subroutines 1582 1583=head2 XSUBs and the Argument Stack 1584 1585The XSUB mechanism is a simple way for Perl programs to access C subroutines. 1586An XSUB routine will have a stack that contains the arguments from the Perl 1587program, and a way to map from the Perl data structures to a C equivalent. 1588 1589The stack arguments are accessible through the C<ST(n)> macro, which returns 1590the C<n>'th stack argument. Argument 0 is the first argument passed in the 1591Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 1592an C<SV*> is used. 1593 1594Most of the time, output from the C routine can be handled through use of 1595the RETVAL and OUTPUT directives. However, there are some cases where the 1596argument stack is not already long enough to handle all the return values. 1597An example is the POSIX tzname() call, which takes no arguments, but returns 1598two, the local time zone's standard and summer time abbreviations. 1599 1600To handle this situation, the PPCODE directive is used and the stack is 1601extended using the macro: 1602 1603 EXTEND(SP, num); 1604 1605where C<SP> is the macro that represents the local copy of the stack pointer, 1606and C<num> is the number of elements the stack should be extended by. 1607 1608Now that there is room on the stack, values can be pushed on it using C<PUSHs> 1609macro. The pushed values will often need to be "mortal" (See 1610L</Reference Counts and Mortality>): 1611 1612 PUSHs(sv_2mortal(newSViv(an_integer))) 1613 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) 1614 PUSHs(sv_2mortal(newSVnv(a_double))) 1615 PUSHs(sv_2mortal(newSVpv("Some String",0))) 1616 /* Although the last example is better written as the more 1617 * efficient: */ 1618 PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) 1619 1620And now the Perl program calling C<tzname>, the two values will be assigned 1621as in: 1622 1623 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 1624 1625An alternate (and possibly simpler) method to pushing values on the stack is 1626to use the macro: 1627 1628 XPUSHs(SV*) 1629 1630This macro automatically adjusts the stack for you, if needed. Thus, you 1631do not need to call C<EXTEND> to extend the stack. 1632 1633Despite their suggestions in earlier versions of this document the macros 1634C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. 1635For that, either stick to the C<(X)PUSHs> macros shown above, or use the new 1636C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. 1637 1638For more information, consult L<perlxs> and L<perlxstut>. 1639 1640=head2 Autoloading with XSUBs 1641 1642If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the 1643fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable 1644of the XSUB's package. 1645 1646But it also puts the same information in certain fields of the XSUB itself: 1647 1648 HV *stash = CvSTASH(cv); 1649 const char *subname = SvPVX(cv); 1650 STRLEN name_length = SvCUR(cv); /* in bytes */ 1651 U32 is_utf8 = SvUTF8(cv); 1652 1653C<SvPVX(cv)> contains just the sub name itself, not including the package. 1654For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, 1655C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. 1656 1657B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support 1658XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the 1659XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need 1660to support 5.8-5.14, use the XSUB's fields. 1661 1662=head2 Calling Perl Routines from within C Programs 1663 1664There are four routines that can be used to call a Perl subroutine from 1665within a C program. These four are: 1666 1667 I32 call_sv(SV*, I32); 1668 I32 call_pv(const char*, I32); 1669 I32 call_method(const char*, I32); 1670 I32 call_argv(const char*, I32, char**); 1671 1672The routine most often used is C<call_sv>. The C<SV*> argument 1673contains either the name of the Perl subroutine to be called, or a 1674reference to the subroutine. The second argument consists of flags 1675that control the context in which the subroutine is called, whether 1676or not the subroutine is being passed arguments, how errors should be 1677trapped, and how to treat return values. 1678 1679All four routines return the number of arguments that the subroutine returned 1680on the Perl stack. 1681 1682These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 1683but those names are now deprecated; macros of the same name are provided for 1684compatibility. 1685 1686When using any of these routines (except C<call_argv>), the programmer 1687must manipulate the Perl stack. These include the following macros and 1688functions: 1689 1690 dSP 1691 SP 1692 PUSHMARK() 1693 PUTBACK 1694 SPAGAIN 1695 ENTER 1696 SAVETMPS 1697 FREETMPS 1698 LEAVE 1699 XPUSH*() 1700 POP*() 1701 1702For a detailed description of calling conventions from C to Perl, 1703consult L<perlcall>. 1704 1705=head2 Putting a C value on Perl stack 1706 1707A lot of opcodes (this is an elementary operation in the internal perl 1708stack machine) put an SV* on the stack. However, as an optimization 1709the corresponding SV is (usually) not recreated each time. The opcodes 1710reuse specially assigned SVs (I<target>s) which are (as a corollary) 1711not constantly freed/created. 1712 1713Each of the targets is created only once (but see 1714L<Scratchpads and recursion> below), and when an opcode needs to put 1715an integer, a double, or a string on stack, it just sets the 1716corresponding parts of its I<target> and puts the I<target> on stack. 1717 1718The macro to put this target on stack is C<PUSHTARG>, and it is 1719directly used in some opcodes, as well as indirectly in zillions of 1720others, which use it via C<(X)PUSH[iunp]>. 1721 1722Because the target is reused, you must be careful when pushing multiple 1723values on the stack. The following code will not do what you think: 1724 1725 XPUSHi(10); 1726 XPUSHi(20); 1727 1728This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 1729the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 1730At the end of the operation, the stack does not contain the values 10 1731and 20, but actually contains two pointers to C<TARG>, which we have set 1732to 20. 1733 1734If you need to push multiple different values then you should either use 1735the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, 1736none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an 1737SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, 1738will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make 1739this a little easier to achieve by creating a new mortal for you (via 1740C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary 1741in the case of the C<mXPUSH[iunp]> macros), and then setting its value. 1742Thus, instead of writing this to "fix" the example above: 1743 1744 XPUSHs(sv_2mortal(newSViv(10))) 1745 XPUSHs(sv_2mortal(newSViv(20))) 1746 1747you can simply write: 1748 1749 mXPUSHi(10) 1750 mXPUSHi(20) 1751 1752On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to 1753need a C<dTARG> in your variable declarations so that the C<*PUSH*> 1754macros can make use of the local variable C<TARG>. See also C<dTARGET> 1755and C<dXSTARG>. 1756 1757=head2 Scratchpads 1758 1759The question remains on when the SVs which are I<target>s for opcodes 1760are created. The answer is that they are created when the current 1761unit--a subroutine or a file (for opcodes for statements outside of 1762subroutines)--is compiled. During this time a special anonymous Perl 1763array is created, which is called a scratchpad for the current unit. 1764 1765A scratchpad keeps SVs which are lexicals for the current unit and are 1766targets for opcodes. A previous version of this document 1767stated that one can deduce that an SV lives on a scratchpad 1768by looking on its flags: lexicals have C<SVs_PADMY> set, and 1769I<target>s have C<SVs_PADTMP> set. But this has never been fully true. 1770C<SVs_PADMY> could be set on a variable that no longer resides in any pad. 1771While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables 1772that have never resided in a pad, but nonetheless act like I<target>s. 1773 1774The correspondence between OPs and I<target>s is not 1-to-1. Different 1775OPs in the compile tree of the unit can use the same target, if this 1776would not conflict with the expected life of the temporary. 1777 1778=head2 Scratchpads and recursion 1779 1780In fact it is not 100% true that a compiled unit contains a pointer to 1781the scratchpad AV. In fact it contains a pointer to an AV of 1782(initially) one element, and this element is the scratchpad AV. Why do 1783we need an extra level of indirection? 1784 1785The answer is B<recursion>, and maybe B<threads>. Both 1786these can create several execution pointers going into the same 1787subroutine. For the subroutine-child not write over the temporaries 1788for the subroutine-parent (lifespan of which covers the call to the 1789child), the parent and the child should have different 1790scratchpads. (I<And> the lexicals should be separate anyway!) 1791 1792So each subroutine is born with an array of scratchpads (of length 1). 1793On each entry to the subroutine it is checked that the current 1794depth of the recursion is not more than the length of this array, and 1795if it is, new scratchpad is created and pushed into the array. 1796 1797The I<target>s on this scratchpad are C<undef>s, but they are already 1798marked with correct flags. 1799 1800=head1 Memory Allocation 1801 1802=head2 Allocation 1803 1804All memory meant to be used with the Perl API functions should be manipulated 1805using the macros described in this section. The macros provide the necessary 1806transparency between differences in the actual malloc implementation that is 1807used within perl. 1808 1809It is suggested that you enable the version of malloc that is distributed 1810with Perl. It keeps pools of various sizes of unallocated memory in 1811order to satisfy allocation requests more quickly. However, on some 1812platforms, it may cause spurious malloc or free errors. 1813 1814The following three macros are used to initially allocate memory : 1815 1816 Newx(pointer, number, type); 1817 Newxc(pointer, number, type, cast); 1818 Newxz(pointer, number, type); 1819 1820The first argument C<pointer> should be the name of a variable that will 1821point to the newly allocated memory. 1822 1823The second and third arguments C<number> and C<type> specify how many of 1824the specified type of data structure should be allocated. The argument 1825C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, 1826should be used if the C<pointer> argument is different from the C<type> 1827argument. 1828 1829Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> 1830to zero out all the newly allocated memory. 1831 1832=head2 Reallocation 1833 1834 Renew(pointer, number, type); 1835 Renewc(pointer, number, type, cast); 1836 Safefree(pointer) 1837 1838These three macros are used to change a memory buffer size or to free a 1839piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 1840match those of C<New> and C<Newc> with the exception of not needing the 1841"magic cookie" argument. 1842 1843=head2 Moving 1844 1845 Move(source, dest, number, type); 1846 Copy(source, dest, number, type); 1847 Zero(dest, number, type); 1848 1849These three macros are used to move, copy, or zero out previously allocated 1850memory. The C<source> and C<dest> arguments point to the source and 1851destination starting points. Perl will move, copy, or zero out C<number> 1852instances of the size of the C<type> data structure (using the C<sizeof> 1853function). 1854 1855=head1 PerlIO 1856 1857The most recent development releases of Perl have been experimenting with 1858removing Perl's dependency on the "normal" standard I/O suite and allowing 1859other stdio implementations to be used. This involves creating a new 1860abstraction layer that then calls whichever implementation of stdio Perl 1861was compiled with. All XSUBs should now use the functions in the PerlIO 1862abstraction layer and not make any assumptions about what kind of stdio 1863is being used. 1864 1865For a complete description of the PerlIO abstraction, consult L<perlapio>. 1866 1867=head1 Compiled code 1868 1869=head2 Code tree 1870 1871Here we describe the internal form your code is converted to by 1872Perl. Start with a simple example: 1873 1874 $a = $b + $c; 1875 1876This is converted to a tree similar to this one: 1877 1878 assign-to 1879 / \ 1880 + $a 1881 / \ 1882 $b $c 1883 1884(but slightly more complicated). This tree reflects the way Perl 1885parsed your code, but has nothing to do with the execution order. 1886There is an additional "thread" going through the nodes of the tree 1887which shows the order of execution of the nodes. In our simplified 1888example above it looks like: 1889 1890 $b ---> $c ---> + ---> $a ---> assign-to 1891 1892But with the actual compile tree for C<$a = $b + $c> it is different: 1893some nodes I<optimized away>. As a corollary, though the actual tree 1894contains more nodes than our simplified example, the execution order 1895is the same as in our example. 1896 1897=head2 Examining the tree 1898 1899If you have your perl compiled for debugging (usually done with 1900C<-DDEBUGGING> on the C<Configure> command line), you may examine the 1901compiled tree by specifying C<-Dx> on the Perl command line. The 1902output takes several lines per node, and for C<$b+$c> it looks like 1903this: 1904 1905 5 TYPE = add ===> 6 1906 TARG = 1 1907 FLAGS = (SCALAR,KIDS) 1908 { 1909 TYPE = null ===> (4) 1910 (was rv2sv) 1911 FLAGS = (SCALAR,KIDS) 1912 { 1913 3 TYPE = gvsv ===> 4 1914 FLAGS = (SCALAR) 1915 GV = main::b 1916 } 1917 } 1918 { 1919 TYPE = null ===> (5) 1920 (was rv2sv) 1921 FLAGS = (SCALAR,KIDS) 1922 { 1923 4 TYPE = gvsv ===> 5 1924 FLAGS = (SCALAR) 1925 GV = main::c 1926 } 1927 } 1928 1929This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 1930not optimized away (one per number in the left column). The immediate 1931children of the given node correspond to C<{}> pairs on the same level 1932of indentation, thus this listing corresponds to the tree: 1933 1934 add 1935 / \ 1936 null null 1937 | | 1938 gvsv gvsv 1939 1940The execution order is indicated by C<===E<gt>> marks, thus it is C<3 19414 5 6> (node C<6> is not included into above listing), i.e., 1942C<gvsv gvsv add whatever>. 1943 1944Each of these nodes represents an op, a fundamental operation inside the 1945Perl core. The code which implements each operation can be found in the 1946F<pp*.c> files; the function which implements the op with type C<gvsv> 1947is C<pp_gvsv>, and so on. As the tree above shows, different ops have 1948different numbers of children: C<add> is a binary operator, as one would 1949expect, and so has two children. To accommodate the various different 1950numbers of children, there are various types of op data structure, and 1951they link together in different ways. 1952 1953The simplest type of op structure is C<OP>: this has no children. Unary 1954operators, C<UNOP>s, have one child, and this is pointed to by the 1955C<op_first> field. Binary operators (C<BINOP>s) have not only an 1956C<op_first> field but also an C<op_last> field. The most complex type of 1957op is a C<LISTOP>, which has any number of children. In this case, the 1958first child is pointed to by C<op_first> and the last child by 1959C<op_last>. The children in between can be found by iteratively 1960following the C<op_sibling> pointer from the first child to the last. 1961 1962There are also two other op types: a C<PMOP> holds a regular expression, 1963and has no children, and a C<LOOP> may or may not have children. If the 1964C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 1965complicate matters, if a C<UNOP> is actually a C<null> op after 1966optimization (see L</Compile pass 2: context propagation>) it will still 1967have children in accordance with its former type. 1968 1969Another way to examine the tree is to use a compiler back-end module, such 1970as L<B::Concise>. 1971 1972=head2 Compile pass 1: check routines 1973 1974The tree is created by the compiler while I<yacc> code feeds it 1975the constructions it recognizes. Since I<yacc> works bottom-up, so does 1976the first pass of perl compilation. 1977 1978What makes this pass interesting for perl developers is that some 1979optimization may be performed on this pass. This is optimization by 1980so-called "check routines". The correspondence between node names 1981and corresponding check routines is described in F<opcode.pl> (do not 1982forget to run C<make regen_headers> if you modify this file). 1983 1984A check routine is called when the node is fully constructed except 1985for the execution-order thread. Since at this time there are no 1986back-links to the currently constructed node, one can do most any 1987operation to the top-level node, including freeing it and/or creating 1988new nodes above/below it. 1989 1990The check routine returns the node which should be inserted into the 1991tree (if the top-level node was not modified, check routine returns 1992its argument). 1993 1994By convention, check routines have names C<ck_*>. They are usually 1995called from C<new*OP> subroutines (or C<convert>) (which in turn are 1996called from F<perly.y>). 1997 1998=head2 Compile pass 1a: constant folding 1999 2000Immediately after the check routine is called the returned node is 2001checked for being compile-time executable. If it is (the value is 2002judged to be constant) it is immediately executed, and a I<constant> 2003node with the "return value" of the corresponding subtree is 2004substituted instead. The subtree is deleted. 2005 2006If constant folding was not performed, the execution-order thread is 2007created. 2008 2009=head2 Compile pass 2: context propagation 2010 2011When a context for a part of compile tree is known, it is propagated 2012down through the tree. At this time the context can have 5 values 2013(instead of 2 for runtime context): void, boolean, scalar, list, and 2014lvalue. In contrast with the pass 1 this pass is processed from top 2015to bottom: a node's context determines the context for its children. 2016 2017Additional context-dependent optimizations are performed at this time. 2018Since at this moment the compile tree contains back-references (via 2019"thread" pointers), nodes cannot be free()d now. To allow 2020optimized-away nodes at this stage, such nodes are null()ified instead 2021of free()ing (i.e. their type is changed to OP_NULL). 2022 2023=head2 Compile pass 3: peephole optimization 2024 2025After the compile tree for a subroutine (or for an C<eval> or a file) 2026is created, an additional pass over the code is performed. This pass 2027is neither top-down or bottom-up, but in the execution order (with 2028additional complications for conditionals). Optimizations performed 2029at this stage are subject to the same restrictions as in the pass 2. 2030 2031Peephole optimizations are done by calling the function pointed to 2032by the global variable C<PL_peepp>. By default, C<PL_peepp> just 2033calls the function pointed to by the global variable C<PL_rpeepp>. 2034By default, that performs some basic op fixups and optimisations along 2035the execution-order op chain, and recursively calls C<PL_rpeepp> for 2036each side chain of ops (resulting from conditionals). Extensions may 2037provide additional optimisations or fixups, hooking into either the 2038per-subroutine or recursive stage, like this: 2039 2040 static peep_t prev_peepp; 2041 static void my_peep(pTHX_ OP *o) 2042 { 2043 /* custom per-subroutine optimisation goes here */ 2044 prev_peepp(aTHX_ o); 2045 /* custom per-subroutine optimisation may also go here */ 2046 } 2047 BOOT: 2048 prev_peepp = PL_peepp; 2049 PL_peepp = my_peep; 2050 2051 static peep_t prev_rpeepp; 2052 static void my_rpeep(pTHX_ OP *o) 2053 { 2054 OP *orig_o = o; 2055 for(; o; o = o->op_next) { 2056 /* custom per-op optimisation goes here */ 2057 } 2058 prev_rpeepp(aTHX_ orig_o); 2059 } 2060 BOOT: 2061 prev_rpeepp = PL_rpeepp; 2062 PL_rpeepp = my_rpeep; 2063 2064=head2 Pluggable runops 2065 2066The compile tree is executed in a runops function. There are two runops 2067functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 2068with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 2069control over the execution of the compile tree it is possible to provide 2070your own runops function. 2071 2072It's probably best to copy one of the existing runops functions and 2073change it to suit your needs. Then, in the BOOT section of your XS 2074file, add the line: 2075 2076 PL_runops = my_runops; 2077 2078This function should be as efficient as possible to keep your programs 2079running as fast as possible. 2080 2081=head2 Compile-time scope hooks 2082 2083As of perl 5.14 it is possible to hook into the compile-time lexical 2084scope mechanism using C<Perl_blockhook_register>. This is used like 2085this: 2086 2087 STATIC void my_start_hook(pTHX_ int full); 2088 STATIC BHK my_hooks; 2089 2090 BOOT: 2091 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); 2092 Perl_blockhook_register(aTHX_ &my_hooks); 2093 2094This will arrange to have C<my_start_hook> called at the start of 2095compiling every lexical scope. The available hooks are: 2096 2097=over 4 2098 2099=item C<void bhk_start(pTHX_ int full)> 2100 2101This is called just after starting a new lexical scope. Note that Perl 2102code like 2103 2104 if ($x) { ... } 2105 2106creates two scopes: the first starts at the C<(> and has C<full == 1>, 2107the second starts at the C<{> and has C<full == 0>. Both end at the 2108C<}>, so calls to C<start> and C<pre/post_end> will match. Anything 2109pushed onto the save stack by this hook will be popped just before the 2110scope ends (between the C<pre_> and C<post_end> hooks, in fact). 2111 2112=item C<void bhk_pre_end(pTHX_ OP **o)> 2113 2114This is called at the end of a lexical scope, just before unwinding the 2115stack. I<o> is the root of the optree representing the scope; it is a 2116double pointer so you can replace the OP if you need to. 2117 2118=item C<void bhk_post_end(pTHX_ OP **o)> 2119 2120This is called at the end of a lexical scope, just after unwinding the 2121stack. I<o> is as above. Note that it is possible for calls to C<pre_> 2122and C<post_end> to nest, if there is something on the save stack that 2123calls string eval. 2124 2125=item C<void bhk_eval(pTHX_ OP *const o)> 2126 2127This is called just before starting to compile an C<eval STRING>, C<do 2128FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the 2129OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, 2130C<OP_DOFILE> or C<OP_REQUIRE>. 2131 2132=back 2133 2134Once you have your hook functions, you need a C<BHK> structure to put 2135them in. It's best to allocate it statically, since there is no way to 2136free it once it's registered. The function pointers should be inserted 2137into this structure using the C<BhkENTRY_set> macro, which will also set 2138flags indicating which entries are valid. If you do need to allocate 2139your C<BHK> dynamically for some reason, be sure to zero it before you 2140start. 2141 2142Once registered, there is no mechanism to switch these hooks off, so if 2143that is necessary you will need to do this yourself. An entry in C<%^H> 2144is probably the best way, so the effect is lexically scoped; however it 2145is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to 2146temporarily switch entries on and off. You should also be aware that 2147generally speaking at least one scope will have opened before your 2148extension is loaded, so you will see some C<pre/post_end> pairs that 2149didn't have a matching C<start>. 2150 2151=head1 Examining internal data structures with the C<dump> functions 2152 2153To aid debugging, the source file F<dump.c> contains a number of 2154functions which produce formatted output of internal data structures. 2155 2156The most commonly used of these functions is C<Perl_sv_dump>; it's used 2157for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 2158C<sv_dump> to produce debugging output from Perl-space, so users of that 2159module should already be familiar with its format. 2160 2161C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 2162derivatives, and produces output similar to C<perl -Dx>; in fact, 2163C<Perl_dump_eval> will dump the main root of the code being evaluated, 2164exactly like C<-Dx>. 2165 2166Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 2167op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 2168subroutines in a package like so: (Thankfully, these are all xsubs, so 2169there is no op tree) 2170 2171 (gdb) print Perl_dump_packsubs(PL_defstash) 2172 2173 SUB attributes::bootstrap = (xsub 0x811fedc 0) 2174 2175 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 2176 2177 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 2178 2179 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 2180 2181 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 2182 2183and C<Perl_dump_all>, which dumps all the subroutines in the stash and 2184the op tree of the main root. 2185 2186=head1 How multiple interpreters and concurrency are supported 2187 2188=head2 Background and PERL_IMPLICIT_CONTEXT 2189 2190The Perl interpreter can be regarded as a closed box: it has an API 2191for feeding it code or otherwise making it do things, but it also has 2192functions for its own use. This smells a lot like an object, and 2193there are ways for you to build Perl so that you can have multiple 2194interpreters, with one interpreter represented either as a C structure, 2195or inside a thread-specific structure. These structures contain all 2196the context, the state of that interpreter. 2197 2198One macro controls the major Perl build flavor: MULTIPLICITY. The 2199MULTIPLICITY build has a C structure that packages all the interpreter 2200state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also 2201normally defined, and enables the support for passing in a "hidden" first 2202argument that represents all three data structures. MULTIPLICITY makes 2203multi-threaded perls possible (with the ithreads threading model, related 2204to the macro USE_ITHREADS.) 2205 2206Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and 2207PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the 2208former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the 2209internal variables of Perl to be wrapped inside a single global struct, 2210struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or 2211the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes 2212one step further, there is still a single struct (allocated in main() 2213either from heap or from stack) but there are no global data symbols 2214pointing to it. In either case the global struct should be initialized 2215as the very first thing in main() using Perl_init_global_struct() and 2216correspondingly tear it down after perl_free() using Perl_free_global_struct(), 2217please see F<miniperlmain.c> for usage details. You may also need 2218to use C<dVAR> in your coding to "declare the global variables" 2219when you are using them. dTHX does this for you automatically. 2220 2221To see whether you have non-const data you can use a BSD-compatible C<nm>: 2222 2223 nm libperl.a | grep -v ' [TURtr] ' 2224 2225If this displays any C<D> or C<d> symbols, you have non-const data. 2226 2227For backward compatibility reasons defining just PERL_GLOBAL_STRUCT 2228doesn't actually hide all symbols inside a big global struct: some 2229PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE 2230then hides everything (see how the PERLIO_FUNCS_DECL is used). 2231 2232All this obviously requires a way for the Perl internal functions to be 2233either subroutines taking some kind of structure as the first 2234argument, or subroutines taking nothing as the first argument. To 2235enable these two very different ways of building the interpreter, 2236the Perl source (as it does in so many other situations) makes heavy 2237use of macros and subroutine naming conventions. 2238 2239First problem: deciding which functions will be public API functions and 2240which will be private. All functions whose names begin C<S_> are private 2241(think "S" for "secret" or "static"). All other functions begin with 2242"Perl_", but just because a function begins with "Perl_" does not mean it is 2243part of the API. (See L</Internal 2244Functions>.) The easiest way to be B<sure> a 2245function is part of the API is to find its entry in L<perlapi>. 2246If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 2247think it should be (i.e., you need it for your extension), send mail via 2248L<perlbug> explaining why you think it should be. 2249 2250Second problem: there must be a syntax so that the same subroutine 2251declarations and calls can pass a structure as their first argument, 2252or pass nothing. To solve this, the subroutines are named and 2253declared in a particular way. Here's a typical start of a static 2254function used within the Perl guts: 2255 2256 STATIC void 2257 S_incline(pTHX_ char *s) 2258 2259STATIC becomes "static" in C, and may be #define'd to nothing in some 2260configurations in the future. 2261 2262A public function (i.e. part of the internal API, but not necessarily 2263sanctioned for use in extensions) begins like this: 2264 2265 void 2266 Perl_sv_setiv(pTHX_ SV* dsv, IV num) 2267 2268C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the 2269details of the interpreter's context. THX stands for "thread", "this", 2270or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 2271The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 2272or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 2273their variants. 2274 2275When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no 2276first argument containing the interpreter's context. The trailing underscore 2277in the pTHX_ macro indicates that the macro expansion needs a comma 2278after the context argument because other arguments follow it. If 2279PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the 2280subroutine is not prototyped to take the extra argument. The form of the 2281macro without the trailing underscore is used when there are no additional 2282explicit arguments. 2283 2284When a core function calls another, it must pass the context. This 2285is normally hidden via macros. Consider C<sv_setiv>. It expands into 2286something like this: 2287 2288 #ifdef PERL_IMPLICIT_CONTEXT 2289 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 2290 /* can't do this for vararg functions, see below */ 2291 #else 2292 #define sv_setiv Perl_sv_setiv 2293 #endif 2294 2295This works well, and means that XS authors can gleefully write: 2296 2297 sv_setiv(foo, bar); 2298 2299and still have it work under all the modes Perl could have been 2300compiled with. 2301 2302This doesn't work so cleanly for varargs functions, though, as macros 2303imply that the number of arguments is known in advance. Instead we 2304either need to spell them out fully, passing C<aTHX_> as the first 2305argument (the Perl core tends to do this with functions like 2306Perl_warner), or use a context-free version. 2307 2308The context-free version of Perl_warner is called 2309Perl_warner_nocontext, and does not take the extra argument. Instead 2310it does dTHX; to get the context from thread-local storage. We 2311C<#define warner Perl_warner_nocontext> so that extensions get source 2312compatibility at the expense of performance. (Passing an arg is 2313cheaper than grabbing it from thread-local storage.) 2314 2315You can ignore [pad]THXx when browsing the Perl headers/sources. 2316Those are strictly for use within the core. Extensions and embedders 2317need only be aware of [pad]THX. 2318 2319=head2 So what happened to dTHR? 2320 2321C<dTHR> was introduced in perl 5.005 to support the older thread model. 2322The older thread model now uses the C<THX> mechanism to pass context 2323pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 2324later still have it for backward source compatibility, but it is defined 2325to be a no-op. 2326 2327=head2 How do I use all this in extensions? 2328 2329When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call 2330any functions in the Perl API will need to pass the initial context 2331argument somehow. The kicker is that you will need to write it in 2332such a way that the extension still compiles when Perl hasn't been 2333built with PERL_IMPLICIT_CONTEXT enabled. 2334 2335There are three ways to do this. First, the easy but inefficient way, 2336which is also the default, in order to maintain source compatibility 2337with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX 2338and aTHX_ macros to call a function that will return the context. 2339Thus, something like: 2340 2341 sv_setiv(sv, num); 2342 2343in your extension will translate to this when PERL_IMPLICIT_CONTEXT is 2344in effect: 2345 2346 Perl_sv_setiv(Perl_get_context(), sv, num); 2347 2348or to this otherwise: 2349 2350 Perl_sv_setiv(sv, num); 2351 2352You don't have to do anything new in your extension to get this; since 2353the Perl library provides Perl_get_context(), it will all just 2354work. 2355 2356The second, more efficient way is to use the following template for 2357your Foo.xs: 2358 2359 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2360 #include "EXTERN.h" 2361 #include "perl.h" 2362 #include "XSUB.h" 2363 2364 STATIC void my_private_function(int arg1, int arg2); 2365 2366 STATIC void 2367 my_private_function(int arg1, int arg2) 2368 { 2369 dTHX; /* fetch context */ 2370 ... call many Perl API functions ... 2371 } 2372 2373 [... etc ...] 2374 2375 MODULE = Foo PACKAGE = Foo 2376 2377 /* typical XSUB */ 2378 2379 void 2380 my_xsub(arg) 2381 int arg 2382 CODE: 2383 my_private_function(arg, 10); 2384 2385Note that the only two changes from the normal way of writing an 2386extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 2387including the Perl headers, followed by a C<dTHX;> declaration at 2388the start of every function that will call the Perl API. (You'll 2389know which functions need this, because the C compiler will complain 2390that there's an undeclared identifier in those functions.) No changes 2391are needed for the XSUBs themselves, because the XS() macro is 2392correctly defined to pass in the implicit context if needed. 2393 2394The third, even more efficient way is to ape how it is done within 2395the Perl guts: 2396 2397 2398 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2399 #include "EXTERN.h" 2400 #include "perl.h" 2401 #include "XSUB.h" 2402 2403 /* pTHX_ only needed for functions that call Perl API */ 2404 STATIC void my_private_function(pTHX_ int arg1, int arg2); 2405 2406 STATIC void 2407 my_private_function(pTHX_ int arg1, int arg2) 2408 { 2409 /* dTHX; not needed here, because THX is an argument */ 2410 ... call Perl API functions ... 2411 } 2412 2413 [... etc ...] 2414 2415 MODULE = Foo PACKAGE = Foo 2416 2417 /* typical XSUB */ 2418 2419 void 2420 my_xsub(arg) 2421 int arg 2422 CODE: 2423 my_private_function(aTHX_ arg, 10); 2424 2425This implementation never has to fetch the context using a function 2426call, since it is always passed as an extra argument. Depending on 2427your needs for simplicity or efficiency, you may mix the previous 2428two approaches freely. 2429 2430Never add a comma after C<pTHX> yourself--always use the form of the 2431macro with the underscore for functions that take explicit arguments, 2432or the form without the argument for functions with no explicit arguments. 2433 2434If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR> 2435definition is needed if the Perl global variables (see F<perlvars.h> 2436or F<globvar.sym>) are accessed in the function and C<dTHX> is not 2437used (the C<dTHX> includes the C<dVAR> if necessary). One notices 2438the need for C<dVAR> only with the said compile-time define, because 2439otherwise the Perl global variables are visible as-is. 2440 2441=head2 Should I do anything special if I call perl from multiple threads? 2442 2443If you create interpreters in one thread and then proceed to call them in 2444another, you need to make sure perl's own Thread Local Storage (TLS) slot is 2445initialized correctly in each of those threads. 2446 2447The C<perl_alloc> and C<perl_clone> API functions will automatically set 2448the TLS slot to the interpreter they created, so that there is no need to do 2449anything special if the interpreter is always accessed in the same thread that 2450created it, and that thread did not create or call any other interpreters 2451afterwards. If that is not the case, you have to set the TLS slot of the 2452thread before calling any functions in the Perl API on that particular 2453interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 2454thread as the first thing you do: 2455 2456 /* do this before doing anything else with some_perl */ 2457 PERL_SET_CONTEXT(some_perl); 2458 2459 ... other Perl API calls on some_perl go here ... 2460 2461=head2 Future Plans and PERL_IMPLICIT_SYS 2462 2463Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything 2464that the interpreter knows about itself and pass it around, so too are 2465there plans to allow the interpreter to bundle up everything it knows 2466about the environment it's running on. This is enabled with the 2467PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on 2468Windows. 2469 2470This allows the ability to provide an extra pointer (called the "host" 2471environment) for all the system calls. This makes it possible for 2472all the system stuff to maintain their own state, broken down into 2473seven C structures. These are thin wrappers around the usual system 2474calls (see F<win32/perllib.c>) for the default perl executable, but for a 2475more ambitious host (like the one that would do fork() emulation) all 2476the extra work needed to pretend that different interpreters are 2477actually different "processes", would be done here. 2478 2479The Perl engine/interpreter and the host are orthogonal entities. 2480There could be one or more interpreters in a process, and one or 2481more "hosts", with free association between them. 2482 2483=head1 Internal Functions 2484 2485All of Perl's internal functions which will be exposed to the outside 2486world are prefixed by C<Perl_> so that they will not conflict with XS 2487functions or functions used in a program in which Perl is embedded. 2488Similarly, all global variables begin with C<PL_>. (By convention, 2489static functions start with C<S_>.) 2490 2491Inside the Perl core (C<PERL_CORE> defined), you can get at the functions 2492either with or without the C<Perl_> prefix, thanks to a bunch of defines 2493that live in F<embed.h>. Note that extension code should I<not> set 2494C<PERL_CORE>; this exposes the full perl internals, and is likely to cause 2495breakage of the XS in each new perl release. 2496 2497The file F<embed.h> is generated automatically from 2498F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 2499header files for the internal functions, generates the documentation 2500and a lot of other bits and pieces. It's important that when you add 2501a new function to the core or change an existing one, you change the 2502data in the table in F<embed.fnc> as well. Here's a sample entry from 2503that table: 2504 2505 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 2506 2507The second column is the return type, the third column the name. Columns 2508after that are the arguments. The first column is a set of flags: 2509 2510=over 3 2511 2512=item A 2513 2514This function is a part of the public 2515API. All such functions should also 2516have 'd', very few do not. 2517 2518=item p 2519 2520This function has a C<Perl_> prefix; i.e. it is defined as 2521C<Perl_av_fetch>. 2522 2523=item d 2524 2525This function has documentation using the C<apidoc> feature which we'll 2526look at in a second. Some functions have 'd' but not 'A'; docs are good. 2527 2528=back 2529 2530Other available flags are: 2531 2532=over 3 2533 2534=item s 2535 2536This is a static function and is defined as C<STATIC S_whatever>, and 2537usually called within the sources as C<whatever(...)>. 2538 2539=item n 2540 2541This does not need an interpreter context, so the definition has no 2542C<pTHX>, and it follows that callers don't use C<aTHX>. (See 2543L</Background and PERL_IMPLICIT_CONTEXT>.) 2544 2545=item r 2546 2547This function never returns; C<croak>, C<exit> and friends. 2548 2549=item f 2550 2551This function takes a variable number of arguments, C<printf> style. 2552The argument list should end with C<...>, like this: 2553 2554 Afprd |void |croak |const char* pat|... 2555 2556=item M 2557 2558This function is part of the experimental development API, and may change 2559or disappear without notice. 2560 2561=item o 2562 2563This function should not have a compatibility macro to define, say, 2564C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. 2565 2566=item x 2567 2568This function isn't exported out of the Perl core. 2569 2570=item m 2571 2572This is implemented as a macro. 2573 2574=item X 2575 2576This function is explicitly exported. 2577 2578=item E 2579 2580This function is visible to extensions included in the Perl core. 2581 2582=item b 2583 2584Binary backward compatibility; this function is a macro but also has 2585a C<Perl_> implementation (which is exported). 2586 2587=item others 2588 2589See the comments at the top of C<embed.fnc> for others. 2590 2591=back 2592 2593If you edit F<embed.pl> or F<embed.fnc>, you will need to run 2594C<make regen_headers> to force a rebuild of F<embed.h> and other 2595auto-generated files. 2596 2597=head2 Formatted Printing of IVs, UVs, and NVs 2598 2599If you are printing IVs, UVs, or NVS instead of the stdio(3) style 2600formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 2601following macros for portability 2602 2603 IVdf IV in decimal 2604 UVuf UV in decimal 2605 UVof UV in octal 2606 UVxf UV in hexadecimal 2607 NVef NV %e-like 2608 NVff NV %f-like 2609 NVgf NV %g-like 2610 2611These will take care of 64-bit integers and long doubles. 2612For example: 2613 2614 printf("IV is %"IVdf"\n", iv); 2615 2616The IVdf will expand to whatever is the correct format for the IVs. 2617 2618If you are printing addresses of pointers, use UVxf combined 2619with PTR2UV(), do not use %lx or %p. 2620 2621=head2 Pointer-To-Integer and Integer-To-Pointer 2622 2623Because pointer size does not necessarily equal integer size, 2624use the follow macros to do it right. 2625 2626 PTR2UV(pointer) 2627 PTR2IV(pointer) 2628 PTR2NV(pointer) 2629 INT2PTR(pointertotype, integer) 2630 2631For example: 2632 2633 IV iv = ...; 2634 SV *sv = INT2PTR(SV*, iv); 2635 2636and 2637 2638 AV *av = ...; 2639 UV uv = PTR2UV(av); 2640 2641=head2 Exception Handling 2642 2643There are a couple of macros to do very basic exception handling in XS 2644modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to 2645be able to use these macros: 2646 2647 #define NO_XSLOCKS 2648 #include "XSUB.h" 2649 2650You can use these macros if you call code that may croak, but you need 2651to do some cleanup before giving control back to Perl. For example: 2652 2653 dXCPT; /* set up necessary variables */ 2654 2655 XCPT_TRY_START { 2656 code_that_may_croak(); 2657 } XCPT_TRY_END 2658 2659 XCPT_CATCH 2660 { 2661 /* do cleanup here */ 2662 XCPT_RETHROW; 2663 } 2664 2665Note that you always have to rethrow an exception that has been 2666caught. Using these macros, it is not possible to just catch the 2667exception and ignore it. If you have to ignore the exception, you 2668have to use the C<call_*> function. 2669 2670The advantage of using the above macros is that you don't have 2671to setup an extra function for C<call_*>, and that using these 2672macros is faster than using C<call_*>. 2673 2674=head2 Source Documentation 2675 2676There's an effort going on to document the internal functions and 2677automatically produce reference manuals from them - L<perlapi> is one 2678such manual which details all the functions which are available to XS 2679writers. L<perlintern> is the autogenerated manual for the functions 2680which are not part of the API and are supposedly for internal use only. 2681 2682Source documentation is created by putting POD comments into the C 2683source, like this: 2684 2685 /* 2686 =for apidoc sv_setiv 2687 2688 Copies an integer into the given SV. Does not handle 'set' magic. See 2689 C<sv_setiv_mg>. 2690 2691 =cut 2692 */ 2693 2694Please try and supply some documentation if you add functions to the 2695Perl core. 2696 2697=head2 Backwards compatibility 2698 2699The Perl API changes over time. New functions are 2700added or the interfaces of existing functions are 2701changed. The C<Devel::PPPort> module tries to 2702provide compatibility code for some of these changes, so XS writers don't 2703have to code it themselves when supporting multiple versions of Perl. 2704 2705C<Devel::PPPort> generates a C header file F<ppport.h> that can also 2706be run as a Perl script. To generate F<ppport.h>, run: 2707 2708 perl -MDevel::PPPort -eDevel::PPPort::WriteFile 2709 2710Besides checking existing XS code, the script can also be used to retrieve 2711compatibility information for various API calls using the C<--api-info> 2712command line switch. For example: 2713 2714 % perl ppport.h --api-info=sv_magicext 2715 2716For details, see C<perldoc ppport.h>. 2717 2718=head1 Unicode Support 2719 2720Perl 5.6.0 introduced Unicode support. It's important for porters and XS 2721writers to understand this support and make sure that the code they 2722write does not corrupt Unicode data. 2723 2724=head2 What B<is> Unicode, anyway? 2725 2726In the olden, less enlightened times, we all used to use ASCII. Most of 2727us did, anyway. The big problem with ASCII is that it's American. Well, 2728no, that's not actually the problem; the problem is that it's not 2729particularly useful for people who don't use the Roman alphabet. What 2730used to happen was that particular languages would stick their own 2731alphabet in the upper range of the sequence, between 128 and 255. Of 2732course, we then ended up with plenty of variants that weren't quite 2733ASCII, and the whole point of it being a standard was lost. 2734 2735Worse still, if you've got a language like Chinese or 2736Japanese that has hundreds or thousands of characters, then you really 2737can't fit them into a mere 256, so they had to forget about ASCII 2738altogether, and build their own systems using pairs of numbers to refer 2739to one character. 2740 2741To fix this, some people formed Unicode, Inc. and 2742produced a new character set containing all the characters you can 2743possibly think of and more. There are several ways of representing these 2744characters, and the one Perl uses is called UTF-8. UTF-8 uses 2745a variable number of bytes to represent a character. You can learn more 2746about Unicode and Perl's Unicode model in L<perlunicode>. 2747 2748=head2 How can I recognise a UTF-8 string? 2749 2750You can't. This is because UTF-8 data is stored in bytes just like 2751non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 2752capital E with a grave accent, is represented by the two bytes 2753C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 2754has that byte sequence as well. So you can't tell just by looking - this 2755is what makes Unicode input an interesting problem. 2756 2757In general, you either have to know what you're dealing with, or you 2758have to guess. The API function C<is_utf8_string> can help; it'll tell 2759you if a string contains only valid UTF-8 characters. However, it can't 2760do the work for you. On a character-by-character basis, 2761C<is_utf8_char_buf> 2762will tell you whether the current character in a string is valid UTF-8. 2763 2764=head2 How does UTF-8 represent Unicode characters? 2765 2766As mentioned above, UTF-8 uses a variable number of bytes to store a 2767character. Characters with values 0...127 are stored in one 2768byte, just like good ol' ASCII. Character 128 is stored as 2769C<v194.128>; this continues up to character 191, which is 2770C<v194.191>. Now we've run out of bits (191 is binary 2771C<10111111>) so we move on; 192 is C<v195.128>. And 2772so it goes on, moving to three bytes at character 2048. 2773 2774Assuming you know you're dealing with a UTF-8 string, you can find out 2775how long the first character in it is with the C<UTF8SKIP> macro: 2776 2777 char *utf = "\305\233\340\240\201"; 2778 I32 len; 2779 2780 len = UTF8SKIP(utf); /* len is 2 here */ 2781 utf += len; 2782 len = UTF8SKIP(utf); /* len is 3 here */ 2783 2784Another way to skip over characters in a UTF-8 string is to use 2785C<utf8_hop>, which takes a string and a number of characters to skip 2786over. You're on your own about bounds checking, though, so don't use it 2787lightly. 2788 2789All bytes in a multi-byte UTF-8 character will have the high bit set, 2790so you can test if you need to do something special with this 2791character like this (the UTF8_IS_INVARIANT() is a macro that tests 2792whether the byte is encoded as a single byte even in UTF-8): 2793 2794 U8 *utf; 2795 U8 *utf_end; /* 1 beyond buffer pointed to by utf */ 2796 UV uv; /* Note: a UV, not a U8, not a char */ 2797 STRLEN len; /* length of character in bytes */ 2798 2799 if (!UTF8_IS_INVARIANT(*utf)) 2800 /* Must treat this as UTF-8 */ 2801 uv = utf8_to_uvchr_buf(utf, utf_end, &len); 2802 else 2803 /* OK to treat this character as a byte */ 2804 uv = *utf; 2805 2806You can also see in that example that we use C<utf8_to_uvchr_buf> to get the 2807value of the character; the inverse function C<uvchr_to_utf8> is available 2808for putting a UV into UTF-8: 2809 2810 if (!UTF8_IS_INVARIANT(uv)) 2811 /* Must treat this as UTF8 */ 2812 utf8 = uvchr_to_utf8(utf8, uv); 2813 else 2814 /* OK to treat this character as a byte */ 2815 *utf8++ = uv; 2816 2817You B<must> convert characters to UVs using the above functions if 2818you're ever in a situation where you have to match UTF-8 and non-UTF-8 2819characters. You may not skip over UTF-8 characters in this case. If you 2820do this, you'll lose the ability to match hi-bit non-UTF-8 characters; 2821for instance, if your UTF-8 string contains C<v196.172>, and you skip 2822that character, you can never match a C<chr(200)> in a non-UTF-8 string. 2823So don't do that! 2824 2825=head2 How does Perl store UTF-8 strings? 2826 2827Currently, Perl deals with Unicode strings and non-Unicode strings 2828slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the 2829string is internally encoded as UTF-8. Without it, the byte value is the 2830codepoint number and vice versa (in other words, the string is encoded 2831as iso-8859-1, but C<use feature 'unicode_strings'> is needed to get iso-8859-1 2832semantics). This flag is only meaningful if the SV is C<SvPOK> 2833or immediately after stringification via C<SvPV> or a similar 2834macro. You can check and manipulate this flag with the 2835following macros: 2836 2837 SvUTF8(sv) 2838 SvUTF8_on(sv) 2839 SvUTF8_off(sv) 2840 2841This flag has an important effect on Perl's treatment of the string: if 2842Unicode data is not properly distinguished, regular expressions, 2843C<length>, C<substr> and other string handling operations will have 2844undesirable results. 2845 2846The problem comes when you have, for instance, a string that isn't 2847flagged as UTF-8, and contains a byte sequence that could be UTF-8 - 2848especially when combining non-UTF-8 and UTF-8 strings. 2849 2850Never forget that the C<SVf_UTF8> flag is separate to the PV value; you 2851need be sure you don't accidentally knock it off while you're 2852manipulating SVs. More specifically, you cannot expect to do this: 2853 2854 SV *sv; 2855 SV *nsv; 2856 STRLEN len; 2857 char *p; 2858 2859 p = SvPV(sv, len); 2860 frobnicate(p); 2861 nsv = newSVpvn(p, len); 2862 2863The C<char*> string does not tell you the whole story, and you can't 2864copy or reconstruct an SV just by copying the string value. Check if the 2865old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act 2866accordingly: 2867 2868 p = SvPV(sv, len); 2869 frobnicate(p); 2870 nsv = newSVpvn(p, len); 2871 if (SvUTF8(sv)) 2872 SvUTF8_on(nsv); 2873 2874In fact, your C<frobnicate> function should be made aware of whether or 2875not it's dealing with UTF-8 data, so that it can handle the string 2876appropriately. 2877 2878Since just passing an SV to an XS function and copying the data of 2879the SV is not enough to copy the UTF8 flags, even less right is just 2880passing a C<char *> to an XS function. 2881 2882=head2 How do I convert a string to UTF-8? 2883 2884If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade 2885one of the strings to UTF-8. If you've got an SV, the easiest way to do 2886this is: 2887 2888 sv_utf8_upgrade(sv); 2889 2890However, you must not do this, for example: 2891 2892 if (!SvUTF8(left)) 2893 sv_utf8_upgrade(left); 2894 2895If you do this in a binary operator, you will actually change one of the 2896strings that came into the operator, and, while it shouldn't be noticeable 2897by the end user, it can cause problems in deficient code. 2898 2899Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 2900string argument. This is useful for having the data available for 2901comparisons and so on, without harming the original SV. There's also 2902C<utf8_to_bytes> to go the other way, but naturally, this will fail if 2903the string contains any characters above 255 that can't be represented 2904in a single byte. 2905 2906=head2 Is there anything else I need to know? 2907 2908Not really. Just remember these things: 2909 2910=over 3 2911 2912=item * 2913 2914There's no way to tell if a string is UTF-8 or not. You can tell if an SV 2915is UTF-8 by looking at its C<SvUTF8> flag after stringifying it 2916with C<SvPV> or a similar macro. Don't forget to set the flag if 2917something should be UTF-8. Treat the flag as part of the PV, even though 2918it's not - if you pass on the PV to somewhere, pass on the flag too. 2919 2920=item * 2921 2922If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, 2923unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 2924 2925=item * 2926 2927When writing a character C<uv> to a UTF-8 string, B<always> use 2928C<uvchr_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case 2929you can use C<*s = uv>. 2930 2931=item * 2932 2933Mixing UTF-8 and non-UTF-8 strings is 2934tricky. Use C<bytes_to_utf8> to get 2935a new string which is UTF-8 encoded, and then combine them. 2936 2937=back 2938 2939=head1 Custom Operators 2940 2941Custom operator support is an experimental feature that allows you to 2942define your own ops. This is primarily to allow the building of 2943interpreters for other languages in the Perl core, but it also allows 2944optimizations through the creation of "macro-ops" (ops which perform the 2945functions of multiple ops which are usually executed together, such as 2946C<gvsv, gvsv, add>.) 2947 2948This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 2949core does not "know" anything special about this op type, and so it will 2950not be involved in any optimizations. This also means that you can 2951define your custom ops to be any op structure - unary, binary, list and 2952so on - you like. 2953 2954It's important to know what custom operators won't do for you. They 2955won't let you add new syntax to Perl, directly. They won't even let you 2956add new keywords, directly. In fact, they won't change the way Perl 2957compiles a program at all. You have to do those changes yourself, after 2958Perl has compiled the program. You do this either by manipulating the op 2959tree using a C<CHECK> block and the C<B::Generate> module, or by adding 2960a custom peephole optimizer with the C<optimize> module. 2961 2962When you do this, you replace ordinary Perl ops with custom ops by 2963creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own 2964PP function. This should be defined in XS code, and should look like 2965the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 2966takes the appropriate number of values from the stack, and you are 2967responsible for adding stack marks if necessary. 2968 2969You should also "register" your op with the Perl interpreter so that it 2970can produce sensible error and warning messages. Since it is possible to 2971have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 2972Perl uses the value of C<< o->op_ppaddr >> to determine which custom op 2973it is dealing with. You should create an C<XOP> structure for each 2974ppaddr you use, set the properties of the custom op with 2975C<XopENTRY_set>, and register the structure against the ppaddr using 2976C<Perl_custom_op_register>. A trivial example might look like: 2977 2978 static XOP my_xop; 2979 static OP *my_pp(pTHX); 2980 2981 BOOT: 2982 XopENTRY_set(&my_xop, xop_name, "myxop"); 2983 XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); 2984 Perl_custom_op_register(aTHX_ my_pp, &my_xop); 2985 2986The available fields in the structure are: 2987 2988=over 4 2989 2990=item xop_name 2991 2992A short name for your op. This will be included in some error messages, 2993and will also be returned as C<< $op->name >> by the L<B|B> module, so 2994it will appear in the output of module like L<B::Concise|B::Concise>. 2995 2996=item xop_desc 2997 2998A short description of the function of the op. 2999 3000=item xop_class 3001 3002Which of the various C<*OP> structures this op uses. This should be one of 3003the C<OA_*> constants from F<op.h>, namely 3004 3005=over 4 3006 3007=item OA_BASEOP 3008 3009=item OA_UNOP 3010 3011=item OA_BINOP 3012 3013=item OA_LOGOP 3014 3015=item OA_LISTOP 3016 3017=item OA_PMOP 3018 3019=item OA_SVOP 3020 3021=item OA_PADOP 3022 3023=item OA_PVOP_OR_SVOP 3024 3025This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because 3026the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. 3027 3028=item OA_LOOP 3029 3030=item OA_COP 3031 3032=back 3033 3034The other C<OA_*> constants should not be used. 3035 3036=item xop_peep 3037 3038This member is of type C<Perl_cpeep_t>, which expands to C<void 3039(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function 3040will be called from C<Perl_rpeep> when ops of this type are encountered 3041by the peephole optimizer. I<o> is the OP that needs optimizing; 3042I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. 3043 3044=back 3045 3046C<B::Generate> directly supports the creation of custom ops by name. 3047 3048=head1 AUTHORS 3049 3050Until May 1997, this document was maintained by Jeff Okamoto 3051E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 3052itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 3053 3054With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 3055Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 3056Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 3057Stephen McCamant, and Gurusamy Sarathy. 3058 3059=head1 SEE ALSO 3060 3061L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> 3062