1=head1 NAME 2 3perlguts - Introduction to the Perl API 4 5=head1 DESCRIPTION 6 7This document attempts to describe how to use the Perl API, as well as 8to provide some info on the basic workings of the Perl core. It is far 9from complete and probably contains many errors. Please refer any 10questions or comments to the author below. 11 12=head1 Variables 13 14=head2 Datatypes 15 16Perl has three typedefs that handle Perl's three main data types: 17 18 SV Scalar Value 19 AV Array Value 20 HV Hash Value 21 22Each typedef has specific routines that manipulate the various data types. 23 24=head2 What is an "IV"? 25 26Perl uses a special typedef IV which is a simple signed integer type that is 27guaranteed to be large enough to hold a pointer (as well as an integer). 28Additionally, there is the UV, which is simply an unsigned IV. 29 30Perl also uses two special typedefs, I32 and I16, which will always be at 31least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, 32as well.) They will usually be exactly 32 and 16 bits long, but on Crays 33they will both be 64 bits. 34 35=head2 Working with SVs 36 37An SV can be created and loaded with one command. There are five types of 38values that can be loaded: an integer value (IV), an unsigned integer 39value (UV), a double (NV), a string (PV), and another scalar (SV). 40("PV" stands for "Pointer Value". You might think that it is misnamed 41because it is described as pointing only to strings. However, it is 42possible to have it point to other things. For example, it could point 43to an array of UVs. But, 44using it for non-strings requires care, as the underlying assumption of 45much of the internals is that PVs are just for strings. Often, for 46example, a trailing C<NUL> is tacked on automatically. The non-string use 47is documented only in this paragraph.) 48 49The seven routines are: 50 51 SV* newSViv(IV); 52 SV* newSVuv(UV); 53 SV* newSVnv(double); 54 SV* newSVpv(const char*, STRLEN); 55 SV* newSVpvn(const char*, STRLEN); 56 SV* newSVpvf(const char*, ...); 57 SV* newSVsv(SV*); 58 59C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in 60F<config.h>) guaranteed to be large enough to represent the size of 61any string that perl can handle. 62 63In the unlikely case of a SV requiring more complex initialization, you 64can create an empty SV with newSV(len). If C<len> is 0 an empty SV of 65type NULL is returned, else an SV of type PV is returned with len + 1 (for 66the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases 67the SV has the undef value. 68 69 SV *sv = newSV(0); /* no storage allocated */ 70 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage 71 * allocated */ 72 73To change the value of an I<already-existing> SV, there are eight routines: 74 75 void sv_setiv(SV*, IV); 76 void sv_setuv(SV*, UV); 77 void sv_setnv(SV*, double); 78 void sv_setpv(SV*, const char*); 79 void sv_setpvn(SV*, const char*, STRLEN) 80 void sv_setpvf(SV*, const char*, ...); 81 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, 82 SV **, Size_t, bool *); 83 void sv_setsv(SV*, SV*); 84 85Notice that you can choose to specify the length of the string to be 86assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 87allow Perl to calculate the length by using C<sv_setpv> or by specifying 880 as the second argument to C<newSVpv>. Be warned, though, that Perl will 89determine the string's length by using C<strlen>, which depends on the 90string terminating with a C<NUL> character, and not otherwise containing 91NULs. 92 93The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 94formatted output becomes the value. 95 96C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 97either a pointer to a variable argument list or the address and length of 98an array of SVs. The last argument points to a boolean; on return, if that 99boolean is true, then locale-specific information has been used to format 100the string, and the string's contents are therefore untrustworthy (see 101L<perlsec>). This pointer may be NULL if that information is not 102important. Note that this function requires you to specify the length of 103the format. 104 105The C<sv_set*()> functions are not generic enough to operate on values 106that have "magic". See L</Magic Virtual Tables> later in this document. 107 108All SVs that contain strings should be terminated with a C<NUL> character. 109If it is not C<NUL>-terminated there is a risk of 110core dumps and corruptions from code which passes the string to C 111functions or system calls which expect a C<NUL>-terminated string. 112Perl's own functions typically add a trailing C<NUL> for this reason. 113Nevertheless, you should be very careful when you pass a string stored 114in an SV to a C function or system call. 115 116To access the actual value that an SV points to, you can use the macros: 117 118 SvIV(SV*) 119 SvUV(SV*) 120 SvNV(SV*) 121 SvPV(SV*, STRLEN len) 122 SvPV_nolen(SV*) 123 124which will automatically coerce the actual scalar type into an IV, UV, double, 125or string. 126 127In the C<SvPV> macro, the length of the string returned is placed into the 128variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do 129not care what the length of the data is, use the C<SvPV_nolen> macro. 130Historically the C<SvPV> macro with the global variable C<PL_na> has been 131used in this case. But that can be quite inefficient because C<PL_na> must 132be accessed in thread-local storage in threaded Perl. In any case, remember 133that Perl allows arbitrary strings of data that may both contain NULs and 134might not be terminated by a C<NUL>. 135 136Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), 137len);>. It might work with your 138compiler, but it won't work for everyone. 139Break this sort of statement up into separate assignments: 140 141 SV *s; 142 STRLEN len; 143 char *ptr; 144 ptr = SvPV(s, len); 145 foo(ptr, len); 146 147If you want to know if the scalar value is TRUE, you can use: 148 149 SvTRUE(SV*) 150 151Although Perl will automatically grow strings for you, if you need to force 152Perl to allocate more memory for your SV, you can use the macro 153 154 SvGROW(SV*, STRLEN newlen) 155 156which will determine if more memory needs to be allocated. If so, it will 157call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 158decrease, the allocated memory of an SV and that it does not automatically 159add space for the trailing C<NUL> byte (perl's own string functions typically do 160C<SvGROW(sv, len + 1)>). 161 162If you want to write to an existing SV's buffer and set its value to a 163string, use SvPV_force() or one of its variants to force the SV to be 164a PV. This will remove any of various types of non-stringness from 165the SV while preserving the content of the SV in the PV. This can be 166used, for example, to append data from an API function to a buffer 167without extra copying: 168 169 (void)SvPVbyte_force(sv, len); 170 s = SvGROW(sv, len + needlen + 1); 171 /* something that modifies up to needlen bytes at s+len, but 172 modifies newlen bytes 173 eg. newlen = read(fd, s + len, needlen); 174 ignoring errors for these examples 175 */ 176 s[len + newlen] = '\0'; 177 SvCUR_set(sv, len + newlen); 178 SvUTF8_off(sv); 179 SvSETMAGIC(sv); 180 181If you already have the data in memory or if you want to keep your 182code simple, you can use one of the sv_cat*() variants, such as 183sv_catpvn(). If you want to insert anywhere in the string you can use 184sv_insert() or sv_insert_flags(). 185 186If you don't need the existing content of the SV, you can avoid some 187copying with: 188 189 SvPVCLEAR(sv); 190 s = SvGROW(sv, needlen + 1); 191 /* something that modifies up to needlen bytes at s, but modifies 192 newlen bytes 193 eg. newlen = read(fd, s. needlen); 194 */ 195 s[newlen] = '\0'; 196 SvCUR_set(sv, newlen); 197 SvPOK_only(sv); /* also clears SVf_UTF8 */ 198 SvSETMAGIC(sv); 199 200Again, if you already have the data in memory or want to avoid the 201complexity of the above, you can use sv_setpvn(). 202 203If you have a buffer allocated with Newx() and want to set that as the 204SV's value, you can use sv_usepvn_flags(). That has some requirements 205if you want to avoid perl re-allocating the buffer to fit the trailing 206NUL: 207 208 Newx(buf, somesize+1, char); 209 /* ... fill in buf ... */ 210 buf[somesize] = '\0'; 211 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); 212 /* buf now belongs to perl, don't release it */ 213 214If you have an SV and want to know what kind of data Perl thinks is stored 215in it, you can use the following macros to check the type of SV you have. 216 217 SvIOK(SV*) 218 SvNOK(SV*) 219 SvPOK(SV*) 220 221You can get and set the current length of the string stored in an SV with 222the following macros: 223 224 SvCUR(SV*) 225 SvCUR_set(SV*, I32 val) 226 227You can also get a pointer to the end of the string stored in the SV 228with the macro: 229 230 SvEND(SV*) 231 232But note that these last three macros are valid only if C<SvPOK()> is true. 233 234If you want to append something to the end of string stored in an C<SV*>, 235you can use the following functions: 236 237 void sv_catpv(SV*, const char*); 238 void sv_catpvn(SV*, const char*, STRLEN); 239 void sv_catpvf(SV*, const char*, ...); 240 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, 241 I32, bool); 242 void sv_catsv(SV*, SV*); 243 244The first function calculates the length of the string to be appended by 245using C<strlen>. In the second, you specify the length of the string 246yourself. The third function processes its arguments like C<sprintf> and 247appends the formatted output. The fourth function works like C<vsprintf>. 248You can specify the address and length of an array of SVs instead of the 249va_list argument. The fifth function 250extends the string stored in the first 251SV with the string stored in the second SV. It also forces the second SV 252to be interpreted as a string. 253 254The C<sv_cat*()> functions are not generic enough to operate on values that 255have "magic". See L</Magic Virtual Tables> later in this document. 256 257If you know the name of a scalar variable, you can get a pointer to its SV 258by using the following: 259 260 SV* get_sv("package::varname", 0); 261 262This returns NULL if the variable does not exist. 263 264If you want to know if this variable (or any other SV) is actually C<defined>, 265you can call: 266 267 SvOK(SV*) 268 269The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 270 271Its address can be used whenever an C<SV*> is needed. Make sure that 272you don't try to compare a random sv with C<&PL_sv_undef>. For example 273when interfacing Perl code, it'll work correctly for: 274 275 foo(undef); 276 277But won't work when called as: 278 279 $x = undef; 280 foo($x); 281 282So to repeat always use SvOK() to check whether an sv is defined. 283 284Also you have to be careful when using C<&PL_sv_undef> as a value in 285AVs or HVs (see L</AVs, HVs and undefined values>). 286 287There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 288boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 289addresses can be used whenever an C<SV*> is needed. 290 291Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 292Take this code: 293 294 SV* sv = (SV*) 0; 295 if (I-am-to-return-a-real-value) { 296 sv = sv_2mortal(newSViv(42)); 297 } 298 sv_setsv(ST(0), sv); 299 300This code tries to return a new SV (which contains the value 42) if it should 301return a real value, or undef otherwise. Instead it has returned a NULL 302pointer which, somewhere down the line, will cause a segmentation violation, 303bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 304first line and all will be well. 305 306To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 307call is not necessary (see L</Reference Counts and Mortality>). 308 309=head2 Offsets 310 311Perl provides the function C<sv_chop> to efficiently remove characters 312from the beginning of a string; you give it an SV and a pointer to 313somewhere inside the PV, and it discards everything before the 314pointer. The efficiency comes by means of a little hack: instead of 315actually removing the characters, C<sv_chop> sets the flag C<OOK> 316(offset OK) to signal to other functions that the offset hack is in 317effect, and it moves the PV pointer (called C<SvPVX>) forward 318by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> 319accordingly. (A portion of the space between the old and new PV 320pointers is used to store the count of chopped bytes.) 321 322Hence, at this point, the start of the buffer that we allocated lives 323at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 324into the middle of this allocated storage. 325 326This is best demonstrated by example. Normally copy-on-write will prevent 327the substitution from operator from using this hack, but if you can craft a 328string for which copy-on-write is not possible, you can see it in play. In 329the current implementation, the final byte of a string buffer is used as a 330copy-on-write reference count. If the buffer is not big enough, then 331copy-on-write is skipped. First have a look at an empty string: 332 333 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' 334 SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 335 REFCNT = 1 336 FLAGS = (POK,pPOK) 337 PV = 0x7ffb7bc05b50 ""\0 338 CUR = 0 339 LEN = 10 340 341Notice here the LEN is 10. (It may differ on your platform.) Extend the 342length of the string to one less than 10, and do a substitution: 343 344 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ 345 Dump($a)' 346 SV = PV(0x7ffa04008a70) at 0x7ffa04030390 347 REFCNT = 1 348 FLAGS = (POK,OOK,pPOK) 349 OFFSET = 1 350 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 351 CUR = 8 352 LEN = 9 353 354Here the number of bytes chopped off (1) is shown next as the OFFSET. The 355portion of the string between the "real" and the "fake" beginnings is 356shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 357the fake beginning, not the real one. (The first character of the string 358buffer happens to have changed to "\1" here, not "1", because the current 359implementation stores the offset count in the string buffer. This is 360subject to change.) 361 362Something similar to the offset hack is performed on AVs to enable 363efficient shifting and splicing off the beginning of the array; while 364C<AvARRAY> points to the first element in the array that is visible from 365Perl, C<AvALLOC> points to the real start of the C array. These are 366usually the same, but a C<shift> operation can be carried out by 367increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. 368Again, the location of the real start of the C array only comes into 369play when freeing the array. See C<av_shift> in F<av.c>. 370 371=head2 What's Really Stored in an SV? 372 373Recall that the usual method of determining the type of scalar you have is 374to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 375usually these macros will always return TRUE and calling the C<Sv*V> 376macros will do the appropriate conversion of string to integer/double or 377integer/double to string. 378 379If you I<really> need to know if you have an integer, double, or string 380pointer in an SV, you can use the following three macros instead: 381 382 SvIOKp(SV*) 383 SvNOKp(SV*) 384 SvPOKp(SV*) 385 386These will tell you if you truly have an integer, double, or string pointer 387stored in your SV. The "p" stands for private. 388 389There are various ways in which the private and public flags may differ. 390For example, in perl 5.16 and earlier a tied SV may have a valid 391underlying value in the IV slot (so SvIOKp is true), but the data 392should be accessed via the FETCH routine rather than directly, 393so SvIOK is false. (In perl 5.18 onwards, tied scalars use 394the flags the same way as untied scalars.) Another is when 395numeric conversion has occurred and precision has been lost: only the 396private flag is set on 'lossy' values. So when an NV is converted to an 397IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 398 399In general, though, it's best to use the C<Sv*V> macros. 400 401=head2 Working with AVs 402 403There are two ways to create and load an AV. The first method creates an 404empty AV: 405 406 AV* newAV(); 407 408The second method both creates the AV and initially populates it with SVs: 409 410 AV* av_make(SSize_t num, SV **ptr); 411 412The second argument points to an array containing C<num> C<SV*>'s. Once the 413AV has been created, the SVs can be destroyed, if so desired. 414 415Once the AV has been created, the following operations are possible on it: 416 417 void av_push(AV*, SV*); 418 SV* av_pop(AV*); 419 SV* av_shift(AV*); 420 void av_unshift(AV*, SSize_t num); 421 422These should be familiar operations, with the exception of C<av_unshift>. 423This routine adds C<num> elements at the front of the array with the C<undef> 424value. You must then use C<av_store> (described below) to assign values 425to these new elements. 426 427Here are some other functions: 428 429 SSize_t av_top_index(AV*); 430 SV** av_fetch(AV*, SSize_t key, I32 lval); 431 SV** av_store(AV*, SSize_t key, SV* val); 432 433The C<av_top_index> function returns the highest index value in an array (just 434like $#array in Perl). If the array is empty, -1 is returned. The 435C<av_fetch> function returns the value at index C<key>, but if C<lval> 436is non-zero, then C<av_fetch> will store an undef value at that index. 437The C<av_store> function stores the value C<val> at index C<key>, and does 438not increment the reference count of C<val>. Thus the caller is responsible 439for taking care of that, and if C<av_store> returns NULL, the caller will 440have to decrement the reference count to avoid a memory leak. Note that 441C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 442return value. 443 444A few more: 445 446 void av_clear(AV*); 447 void av_undef(AV*); 448 void av_extend(AV*, SSize_t key); 449 450The C<av_clear> function deletes all the elements in the AV* array, but 451does not actually delete the array itself. The C<av_undef> function will 452delete all the elements in the array plus the array itself. The 453C<av_extend> function extends the array so that it contains at least C<key+1> 454elements. If C<key+1> is less than the currently allocated length of the array, 455then nothing is done. 456 457If you know the name of an array variable, you can get a pointer to its AV 458by using the following: 459 460 AV* get_av("package::varname", 0); 461 462This returns NULL if the variable does not exist. 463 464See L</Understanding the Magic of Tied Hashes and Arrays> for more 465information on how to use the array access functions on tied arrays. 466 467=head2 Working with HVs 468 469To create an HV, you use the following routine: 470 471 HV* newHV(); 472 473Once the HV has been created, the following operations are possible on it: 474 475 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 476 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 477 478The C<klen> parameter is the length of the key being passed in (Note that 479you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 480length of the key). The C<val> argument contains the SV pointer to the 481scalar being stored, and C<hash> is the precomputed hash value (zero if 482you want C<hv_store> to calculate it for you). The C<lval> parameter 483indicates whether this fetch is actually a part of a store operation, in 484which case a new undefined value will be added to the HV with the supplied 485key and C<hv_fetch> will return as if the value had already existed. 486 487Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 488C<SV*>. To access the scalar value, you must first dereference the return 489value. However, you should check to make sure that the return value is 490not NULL before dereferencing it. 491 492The first of these two functions checks if a hash table entry exists, and the 493second deletes it. 494 495 bool hv_exists(HV*, const char* key, U32 klen); 496 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 497 498If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 499create and return a mortal copy of the deleted value. 500 501And more miscellaneous functions: 502 503 void hv_clear(HV*); 504 void hv_undef(HV*); 505 506Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 507table but does not actually delete the hash table. The C<hv_undef> deletes 508both the entries and the hash table itself. 509 510Perl keeps the actual data in a linked list of structures with a typedef of HE. 511These contain the actual key and value pointers (plus extra administrative 512overhead). The key is a string pointer; the value is an C<SV*>. However, 513once you have an C<HE*>, to get the actual key and value, use the routines 514specified below. 515 516 I32 hv_iterinit(HV*); 517 /* Prepares starting point to traverse hash table */ 518 HE* hv_iternext(HV*); 519 /* Get the next entry, and return a pointer to a 520 structure that has both the key and value */ 521 char* hv_iterkey(HE* entry, I32* retlen); 522 /* Get the key from an HE structure and also return 523 the length of the key string */ 524 SV* hv_iterval(HV*, HE* entry); 525 /* Return an SV pointer to the value of the HE 526 structure */ 527 SV* hv_iternextsv(HV*, char** key, I32* retlen); 528 /* This convenience routine combines hv_iternext, 529 hv_iterkey, and hv_iterval. The key and retlen 530 arguments are return values for the key and its 531 length. The value is returned in the SV* argument */ 532 533If you know the name of a hash variable, you can get a pointer to its HV 534by using the following: 535 536 HV* get_hv("package::varname", 0); 537 538This returns NULL if the variable does not exist. 539 540The hash algorithm is defined in the C<PERL_HASH> macro: 541 542 PERL_HASH(hash, key, klen) 543 544The exact implementation of this macro varies by architecture and version 545of perl, and the return value may change per invocation, so the value 546is only valid for the duration of a single perl process. 547 548See L</Understanding the Magic of Tied Hashes and Arrays> for more 549information on how to use the hash access functions on tied hashes. 550 551=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen 552 553=head2 Hash API Extensions 554 555Beginning with version 5.004, the following functions are also supported: 556 557 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 558 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 559 560 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 561 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 562 563 SV* hv_iterkeysv (HE* entry); 564 565Note that these functions take C<SV*> keys, which simplifies writing 566of extension code that deals with hash structures. These functions 567also allow passing of C<SV*> keys to C<tie> functions without forcing 568you to stringify the keys (unlike the previous set of functions). 569 570They also return and accept whole hash entries (C<HE*>), making their 571use more efficient (since the hash number for a particular string 572doesn't have to be recomputed every time). See L<perlapi> for detailed 573descriptions. 574 575The following macros must always be used to access the contents of hash 576entries. Note that the arguments to these macros must be simple 577variables, since they may get evaluated more than once. See 578L<perlapi> for detailed descriptions of these macros. 579 580 HePV(HE* he, STRLEN len) 581 HeVAL(HE* he) 582 HeHASH(HE* he) 583 HeSVKEY(HE* he) 584 HeSVKEY_force(HE* he) 585 HeSVKEY_set(HE* he, SV* sv) 586 587These two lower level macros are defined, but must only be used when 588dealing with keys that are not C<SV*>s: 589 590 HeKEY(HE* he) 591 HeKLEN(HE* he) 592 593Note that both C<hv_store> and C<hv_store_ent> do not increment the 594reference count of the stored C<val>, which is the caller's responsibility. 595If these functions return a NULL value, the caller will usually have to 596decrement the reference count of C<val> to avoid a memory leak. 597 598=head2 AVs, HVs and undefined values 599 600Sometimes you have to store undefined values in AVs or HVs. Although 601this may be a rare case, it can be tricky. That's because you're 602used to using C<&PL_sv_undef> if you need an undefined SV. 603 604For example, intuition tells you that this XS code: 605 606 AV *av = newAV(); 607 av_store( av, 0, &PL_sv_undef ); 608 609is equivalent to this Perl code: 610 611 my @av; 612 $av[0] = undef; 613 614Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker 615for indicating that an array element has not yet been initialized. 616Thus, C<exists $av[0]> would be true for the above Perl code, but 617false for the array generated by the XS code. In perl 5.20, storing 618&PL_sv_undef will create a read-only element, because the scalar 619&PL_sv_undef itself is stored, not a copy. 620 621Similar problems can occur when storing C<&PL_sv_undef> in HVs: 622 623 hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 624 625This will indeed make the value C<undef>, but if you try to modify 626the value of C<key>, you'll get the following error: 627 628 Modification of non-creatable hash value attempted 629 630In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 631in restricted hashes. This caused such hash entries not to appear 632when iterating over the hash or when checking for the keys 633with the C<hv_exists> function. 634 635You can run into similar problems when you store C<&PL_sv_yes> or 636C<&PL_sv_no> into AVs or HVs. Trying to modify such elements 637will give you the following error: 638 639 Modification of a read-only value attempted 640 641To make a long story short, you can use the special variables 642C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and 643HVs, but you have to make sure you know what you're doing. 644 645Generally, if you want to store an undefined value in an AV 646or HV, you should not use C<&PL_sv_undef>, but rather create a 647new undefined value using the C<newSV> function, for example: 648 649 av_store( av, 42, newSV(0) ); 650 hv_store( hv, "foo", 3, newSV(0), 0 ); 651 652=head2 References 653 654References are a special type of scalar that point to other data types 655(including other references). 656 657To create a reference, use either of the following functions: 658 659 SV* newRV_inc((SV*) thing); 660 SV* newRV_noinc((SV*) thing); 661 662The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 663functions are identical except that C<newRV_inc> increments the reference 664count of the C<thing>, while C<newRV_noinc> does not. For historical 665reasons, C<newRV> is a synonym for C<newRV_inc>. 666 667Once you have a reference, you can use the following macro to dereference 668the reference: 669 670 SvRV(SV*) 671 672then call the appropriate routines, casting the returned C<SV*> to either an 673C<AV*> or C<HV*>, if required. 674 675To determine if an SV is a reference, you can use the following macro: 676 677 SvROK(SV*) 678 679To discover what type of value the reference refers to, use the following 680macro and then check the return value. 681 682 SvTYPE(SvRV(SV*)) 683 684The most useful types that will be returned are: 685 686 SVt_PVAV Array 687 SVt_PVHV Hash 688 SVt_PVCV Code 689 SVt_PVGV Glob (possibly a file handle) 690 691Any numerical value returned which is less than SVt_PVAV will be a scalar 692of some form. 693 694See L<perlapi/svtype> for more details. 695 696=head2 Blessed References and Class Objects 697 698References are also used to support object-oriented programming. In perl's 699OO lexicon, an object is simply a reference that has been blessed into a 700package (or class). Once blessed, the programmer may now use the reference 701to access the various methods in the class. 702 703A reference can be blessed into a package with the following function: 704 705 SV* sv_bless(SV* sv, HV* stash); 706 707The C<sv> argument must be a reference value. The C<stash> argument 708specifies which class the reference will belong to. See 709L</Stashes and Globs> for information on converting class names into stashes. 710 711/* Still under construction */ 712 713The following function upgrades rv to reference if not already one. 714Creates a new SV for rv to point to. If C<classname> is non-null, the SV 715is blessed into the specified class. SV is returned. 716 717 SV* newSVrv(SV* rv, const char* classname); 718 719The following three functions copy integer, unsigned integer or double 720into an SV whose reference is C<rv>. SV is blessed if C<classname> is 721non-null. 722 723 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 724 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 725 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 726 727The following function copies the pointer value (I<the address, not the 728string!>) into an SV whose reference is rv. SV is blessed if C<classname> 729is non-null. 730 731 SV* sv_setref_pv(SV* rv, const char* classname, void* pv); 732 733The following function copies a string into an SV whose reference is C<rv>. 734Set length to 0 to let Perl calculate the string length. SV is blessed if 735C<classname> is non-null. 736 737 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, 738 STRLEN length); 739 740The following function tests whether the SV is blessed into the specified 741class. It does not check inheritance relationships. 742 743 int sv_isa(SV* sv, const char* name); 744 745The following function tests whether the SV is a reference to a blessed object. 746 747 int sv_isobject(SV* sv); 748 749The following function tests whether the SV is derived from the specified 750class. SV can be either a reference to a blessed object or a string 751containing a class name. This is the function implementing the 752C<UNIVERSAL::isa> functionality. 753 754 bool sv_derived_from(SV* sv, const char* name); 755 756To check if you've got an object derived from a specific class you have 757to write: 758 759 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 760 761=head2 Creating New Variables 762 763To create a new Perl variable with an undef value which can be accessed from 764your Perl script, use the following routines, depending on the variable type. 765 766 SV* get_sv("package::varname", GV_ADD); 767 AV* get_av("package::varname", GV_ADD); 768 HV* get_hv("package::varname", GV_ADD); 769 770Notice the use of GV_ADD as the second parameter. The new variable can now 771be set, using the routines appropriate to the data type. 772 773There are additional macros whose values may be bitwise OR'ed with the 774C<GV_ADD> argument to enable certain extra features. Those bits are: 775 776=over 777 778=item GV_ADDMULTI 779 780Marks the variable as multiply defined, thus preventing the: 781 782 Name <varname> used only once: possible typo 783 784warning. 785 786=item GV_ADDWARN 787 788Issues the warning: 789 790 Had to create <varname> unexpectedly 791 792if the variable did not exist before the function was called. 793 794=back 795 796If you do not specify a package name, the variable is created in the current 797package. 798 799=head2 Reference Counts and Mortality 800 801Perl uses a reference count-driven garbage collection mechanism. SVs, 802AVs, or HVs (xV for short in the following) start their life with a 803reference count of 1. If the reference count of an xV ever drops to 0, 804then it will be destroyed and its memory made available for reuse. 805At the most basic internal level, reference counts can be manipulated 806with the following macros: 807 808 int SvREFCNT(SV* sv); 809 SV* SvREFCNT_inc(SV* sv); 810 void SvREFCNT_dec(SV* sv); 811 812(There are also suffixed versions of the increment and decrement macros, 813for situations where the full generality of these basic macros can be 814exchanged for some performance.) 815 816However, the way a programmer should think about references is not so 817much in terms of the bare reference count, but in terms of I<ownership> 818of references. A reference to an xV can be owned by any of a variety 819of entities: another xV, the Perl interpreter, an XS data structure, 820a piece of running code, or a dynamic scope. An xV generally does not 821know what entities own the references to it; it only knows how many 822references there are, which is the reference count. 823 824To correctly maintain reference counts, it is essential to keep track 825of what references the XS code is manipulating. The programmer should 826always know where a reference has come from and who owns it, and be 827aware of any creation or destruction of references, and any transfers 828of ownership. Because ownership isn't represented explicitly in the xV 829data structures, only the reference count need be actually maintained 830by the code, and that means that this understanding of ownership is not 831actually evident in the code. For example, transferring ownership of a 832reference from one owner to another doesn't change the reference count 833at all, so may be achieved with no actual code. (The transferring code 834doesn't touch the referenced object, but does need to ensure that the 835former owner knows that it no longer owns the reference, and that the 836new owner knows that it now does.) 837 838An xV that is visible at the Perl level should not become unreferenced 839and thus be destroyed. Normally, an object will only become unreferenced 840when it is no longer visible, often by the same means that makes it 841invisible. For example, a Perl reference value (RV) owns a reference to 842its referent, so if the RV is overwritten that reference gets destroyed, 843and the no-longer-reachable referent may be destroyed as a result. 844 845Many functions have some kind of reference manipulation as 846part of their purpose. Sometimes this is documented in terms 847of ownership of references, and sometimes it is (less helpfully) 848documented in terms of changes to reference counts. For example, the 849L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV 850(with reference count 1) and increment the reference count of the referent 851that was supplied by the caller. This is best understood as creating 852a new reference to the referent, which is owned by the created RV, 853and returning to the caller ownership of the sole reference to the RV. 854The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not 855increment the reference count of the referent, but the RV nevertheless 856ends up owning a reference to the referent. It is therefore implied 857that the caller of C<newRV_noinc()> is relinquishing a reference to the 858referent, making this conceptually a more complicated operation even 859though it does less to the data structures. 860 861For example, imagine you want to return a reference from an XSUB 862function. Inside the XSUB routine, you create an SV which initially 863has just a single reference, owned by the XSUB routine. This reference 864needs to be disposed of before the routine is complete, otherwise it 865will leak, preventing the SV from ever being destroyed. So to create 866an RV referencing the SV, it is most convenient to pass the SV to 867C<newRV_noinc()>, which consumes that reference. Now the XSUB routine 868no longer owns a reference to the SV, but does own a reference to the RV, 869which in turn owns a reference to the SV. The ownership of the reference 870to the RV is then transferred by the process of returning the RV from 871the XSUB. 872 873There are some convenience functions available that can help with the 874destruction of xVs. These functions introduce the concept of "mortality". 875Much documentation speaks of an xV itself being mortal, but this is 876misleading. It is really I<a reference to> an xV that is mortal, and it 877is possible for there to be more than one mortal reference to a single xV. 878For a reference to be mortal means that it is owned by the temps stack, 879one of perl's many internal stacks, which will destroy that reference 880"a short time later". Usually the "short time later" is the end of 881the current Perl statement. However, it gets more complicated around 882dynamic scopes: there can be multiple sets of mortal references hanging 883around at the same time, with different death dates. Internally, the 884actual determinant for when mortal xV references are destroyed depends 885on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> 886and L</Temporaries Stack> below for more details on these macros. 887 888Mortal references are mainly used for xVs that are placed on perl's 889main stack. The stack is problematic for reference tracking, because it 890contains a lot of xV references, but doesn't own those references: they 891are not counted. Currently, there are many bugs resulting from xVs being 892destroyed while referenced by the stack, because the stack's uncounted 893references aren't enough to keep the xVs alive. So when putting an 894(uncounted) reference on the stack, it is vitally important to ensure that 895there will be a counted reference to the same xV that will last at least 896as long as the uncounted reference. But it's also important that that 897counted reference be cleaned up at an appropriate time, and not unduly 898prolong the xV's life. For there to be a mortal reference is often the 899best way to satisfy this requirement, especially if the xV was created 900especially to be put on the stack and would otherwise be unreferenced. 901 902To create a mortal reference, use the functions: 903 904 SV* sv_newmortal() 905 SV* sv_mortalcopy(SV*) 906 SV* sv_2mortal(SV*) 907 908C<sv_newmortal()> creates an SV (with the undefined value) whose sole 909reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a 910copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> 911mortalises an existing xV reference: it transfers ownership of a reference 912from the caller to the temps stack. Because C<sv_newmortal> gives the new 913SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, 914etc. : 915 916 SV *tmp = sv_newmortal(); 917 sv_setiv(tmp, an_integer); 918 919As that is multiple C statements it is quite common so see this idiom instead: 920 921 SV *tmp = sv_2mortal(newSViv(an_integer)); 922 923The mortal routines are not just for SVs; AVs and HVs can be 924made mortal by passing their address (type-casted to C<SV*>) to the 925C<sv_2mortal> or C<sv_mortalcopy> routines. 926 927=head2 Stashes and Globs 928 929A B<stash> is a hash that contains all variables that are defined 930within a package. Each key of the stash is a symbol 931name (shared by all the different types of objects that have the same 932name), and each value in the hash table is a GV (Glob Value). This GV 933in turn contains references to the various objects of that name, 934including (but not limited to) the following: 935 936 Scalar Value 937 Array Value 938 Hash Value 939 I/O Handle 940 Format 941 Subroutine 942 943There is a single stash called C<PL_defstash> that holds the items that exist 944in the C<main> package. To get at the items in other packages, append the 945string "::" to the package name. The items in the C<Foo> package are in 946the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 947in the stash C<Baz::> in C<Bar::>'s stash. 948 949To get the stash pointer for a particular package, use the function: 950 951 HV* gv_stashpv(const char* name, I32 flags) 952 HV* gv_stashsv(SV*, I32 flags) 953 954The first function takes a literal string, the second uses the string stored 955in the SV. Remember that a stash is just a hash table, so you get back an 956C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. 957 958The name that C<gv_stash*v> wants is the name of the package whose symbol table 959you want. The default package is called C<main>. If you have multiply nested 960packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 961language itself. 962 963Alternately, if you have an SV that is a blessed reference, you can find 964out the stash pointer by using: 965 966 HV* SvSTASH(SvRV(SV*)); 967 968then use the following to get the package name itself: 969 970 char* HvNAME(HV* stash); 971 972If you need to bless or re-bless an object you can use the following 973function: 974 975 SV* sv_bless(SV*, HV* stash) 976 977where the first argument, an C<SV*>, must be a reference, and the second 978argument is a stash. The returned C<SV*> can now be used in the same way 979as any other SV. 980 981For more information on references and blessings, consult L<perlref>. 982 983=head2 Double-Typed SVs 984 985Scalar variables normally contain only one type of value, an integer, 986double, pointer, or reference. Perl will automatically convert the 987actual scalar data from the stored type into the requested type. 988 989Some scalar variables contain more than one type of scalar data. For 990example, the variable C<$!> contains either the numeric value of C<errno> 991or its string equivalent from either C<strerror> or C<sys_errlist[]>. 992 993To force multiple data values into an SV, you must do two things: use the 994C<sv_set*v> routines to add the additional scalar type, then set a flag 995so that Perl will believe it contains more than one type of data. The 996four macros to set the flags are: 997 998 SvIOK_on 999 SvNOK_on 1000 SvPOK_on 1001 SvROK_on 1002 1003The particular macro you must use depends on which C<sv_set*v> routine 1004you called first. This is because every C<sv_set*v> routine turns on 1005only the bit for the particular type of data being set, and turns off 1006all the rest. 1007 1008For example, to create a new Perl variable called "dberror" that contains 1009both the numeric and descriptive string error values, you could use the 1010following code: 1011 1012 extern int dberror; 1013 extern char *dberror_list; 1014 1015 SV* sv = get_sv("dberror", GV_ADD); 1016 sv_setiv(sv, (IV) dberror); 1017 sv_setpv(sv, dberror_list[dberror]); 1018 SvIOK_on(sv); 1019 1020If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 1021macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 1022 1023=head2 Read-Only Values 1024 1025In Perl 5.16 and earlier, copy-on-write (see the next section) shared a 1026flag bit with read-only scalars. So the only way to test whether 1027C<sv_setsv>, etc., will raise a "Modification of a read-only value" error 1028in those versions is: 1029 1030 SvREADONLY(sv) && !SvIsCOW(sv) 1031 1032Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, 1033and, under 5.20, copy-on-write scalars can also be read-only, so the above 1034check is incorrect. You just want: 1035 1036 SvREADONLY(sv) 1037 1038If you need to do this check often, define your own macro like this: 1039 1040 #if PERL_VERSION >= 18 1041 # define SvTRULYREADONLY(sv) SvREADONLY(sv) 1042 #else 1043 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) 1044 #endif 1045 1046=head2 Copy on Write 1047 1048Perl implements a copy-on-write (COW) mechanism for scalars, in which 1049string copies are not immediately made when requested, but are deferred 1050until made necessary by one or the other scalar changing. This is mostly 1051transparent, but one must take care not to modify string buffers that are 1052shared by multiple SVs. 1053 1054You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. 1055 1056You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). 1057 1058If you want to make the SV drop its string buffer, use 1059C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply 1060C<sv_setsv(sv, NULL)>. 1061 1062All of these functions will croak on read-only scalars (see the previous 1063section for more on those). 1064 1065To test that your code is behaving correctly and not modifying COW buffers, 1066on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with 1067C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations 1068into crashes. You will find it to be marvellously slow, so you may want to 1069skip perl's own tests. 1070 1071=head2 Magic Variables 1072 1073[This section still under construction. Ignore everything here. Post no 1074bills. Everything not permitted is forbidden.] 1075 1076Any SV may be magical, that is, it has special features that a normal 1077SV does not have. These features are stored in the SV structure in a 1078linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 1079 1080 struct magic { 1081 MAGIC* mg_moremagic; 1082 MGVTBL* mg_virtual; 1083 U16 mg_private; 1084 char mg_type; 1085 U8 mg_flags; 1086 I32 mg_len; 1087 SV* mg_obj; 1088 char* mg_ptr; 1089 }; 1090 1091Note this is current as of patchlevel 0, and could change at any time. 1092 1093=head2 Assigning Magic 1094 1095Perl adds magic to an SV using the sv_magic function: 1096 1097 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 1098 1099The C<sv> argument is a pointer to the SV that is to acquire a new magical 1100feature. 1101 1102If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 1103convert C<sv> to type C<SVt_PVMG>. 1104Perl then continues by adding new magic 1105to the beginning of the linked list of magical features. Any prior entry 1106of the same type of magic is deleted. Note that this can be overridden, 1107and multiple instances of the same type of magic can be associated with an 1108SV. 1109 1110The C<name> and C<namlen> arguments are used to associate a string with 1111the magic, typically the name of a variable. C<namlen> is stored in the 1112C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of 1113C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on 1114whether C<namlen> is greater than zero or equal to zero respectively. As a 1115special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed 1116to contain an C<SV*> and is stored as-is with its REFCNT incremented. 1117 1118The sv_magic function uses C<how> to determine which, if any, predefined 1119"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 1120See the L</Magic Virtual Tables> section below. The C<how> argument is also 1121stored in the C<mg_type> field. The value of 1122C<how> should be chosen from the set of macros 1123C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 1124these macros were added, Perl internals used to directly use character 1125literals, so you may occasionally come across old code or documentation 1126referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 1127 1128The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 1129structure. If it is not the same as the C<sv> argument, the reference 1130count of the C<obj> object is incremented. If it is the same, or if 1131the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, 1132C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely 1133stored, without the reference count being incremented. 1134 1135See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic 1136to an SV. 1137 1138There is also a function to add magic to an C<HV>: 1139 1140 void hv_magic(HV *hv, GV *gv, int how); 1141 1142This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 1143 1144To remove the magic from an SV, call the function sv_unmagic: 1145 1146 int sv_unmagic(SV *sv, int type); 1147 1148The C<type> argument should be equal to the C<how> value when the C<SV> 1149was initially made magical. 1150 1151However, note that C<sv_unmagic> removes all magic of a certain C<type> from the 1152C<SV>. If you want to remove only certain 1153magic of a C<type> based on the magic 1154virtual table, use C<sv_unmagicext> instead: 1155 1156 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); 1157 1158=head2 Magic Virtual Tables 1159 1160The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 1161C<MGVTBL>, which is a structure of function pointers and stands for 1162"Magic Virtual Table" to handle the various operations that might be 1163applied to that variable. 1164 1165The C<MGVTBL> has five (or sometimes eight) pointers to the following 1166routine types: 1167 1168 int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); 1169 int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); 1170 U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); 1171 int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); 1172 int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); 1173 1174 int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, 1175 const char *name, I32 namlen); 1176 int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); 1177 int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); 1178 1179 1180This MGVTBL structure is set at compile-time in F<perl.h> and there are 1181currently 32 types. These different structures contain pointers to various 1182routines that perform additional actions depending on which function is 1183being called. 1184 1185 Function pointer Action taken 1186 ---------------- ------------ 1187 svt_get Do something before the value of the SV is 1188 retrieved. 1189 svt_set Do something after the SV is assigned a value. 1190 svt_len Report on the SV's length. 1191 svt_clear Clear something the SV represents. 1192 svt_free Free any extra storage associated with the SV. 1193 1194 svt_copy copy tied variable magic to a tied element 1195 svt_dup duplicate a magic structure during thread cloning 1196 svt_local copy magic to local value during 'local' 1197 1198For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 1199to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 1200 1201 { magic_get, magic_set, magic_len, 0, 0 } 1202 1203Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 1204if a get operation is being performed, the routine C<magic_get> is 1205called. All the various routines for the various magical types begin 1206with C<magic_>. NOTE: the magic routines are not considered part of 1207the Perl API, and may not be exported by the Perl library. 1208 1209The last three slots are a recent addition, and for source code 1210compatibility they are only checked for if one of the three flags 1211MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. 1212This means that most code can continue declaring 1213a vtable as a 5-element value. These three are 1214currently used exclusively by the threading code, and are highly subject 1215to change. 1216 1217The current kinds of Magic Virtual Tables are: 1218 1219=for comment 1220This table is generated by regen/mg_vtable.pl. Any changes made here 1221will be lost. 1222 1223=for mg_vtable.pl begin 1224 1225 mg_type 1226 (old-style char and macro) MGVTBL Type of magic 1227 -------------------------- ------ ------------- 1228 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 1229 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1230 % PERL_MAGIC_rhash (none) Extra data for restricted 1231 hashes 1232 * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace 1233 vars 1234 . PERL_MAGIC_pos vtbl_pos pos() lvalue 1235 : PERL_MAGIC_symtab (none) Extra data for symbol 1236 tables 1237 < PERL_MAGIC_backref vtbl_backref For weak ref data 1238 @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV 1239 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore 1240 (fast string search) 1241 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table 1242 (AMT) on stash 1243 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 1244 (@+ and @- vars) 1245 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1246 element 1247 E PERL_MAGIC_env vtbl_env %ENV hash 1248 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1249 f PERL_MAGIC_fm vtbl_regexp Formline 1250 ('compiled' format) 1251 g PERL_MAGIC_regex_global vtbl_mglob m//g target 1252 H PERL_MAGIC_hints vtbl_hints %^H hash 1253 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element 1254 I PERL_MAGIC_isa vtbl_isa @ISA array 1255 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1256 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1257 L PERL_MAGIC_dbfile (none) Debugger %_<filename 1258 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename 1259 element 1260 N PERL_MAGIC_shared (none) Shared between threads 1261 n PERL_MAGIC_shared_scalar (none) Shared between threads 1262 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation 1263 P PERL_MAGIC_tied vtbl_pack Tied array or hash 1264 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1265 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1266 r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex 1267 S PERL_MAGIC_sig (none) %SIG hash 1268 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1269 t PERL_MAGIC_taint vtbl_taint Taintedness 1270 U PERL_MAGIC_uvar vtbl_uvar Available for use by 1271 extensions 1272 u PERL_MAGIC_uvar_elem (none) Reserved for use by 1273 extensions 1274 V PERL_MAGIC_vstring (none) SV was vstring literal 1275 v PERL_MAGIC_vec vtbl_vec vec() lvalue 1276 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information 1277 x PERL_MAGIC_substr vtbl_substr substr() lvalue 1278 Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not 1279 exist 1280 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1281 variable / smart parameter 1282 vivification 1283 \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference 1284 constructor 1285 ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call 1286 to this CV 1287 ~ PERL_MAGIC_ext (none) Available for use by 1288 extensions 1289 1290 1291=for apidoc Amnh||PERL_MAGIC_sv 1292=for apidoc Amnh||PERL_MAGIC_arylen 1293=for apidoc Amnh||PERL_MAGIC_rhash 1294=for apidoc Amnh||PERL_MAGIC_debugvar 1295=for apidoc Amnh||PERL_MAGIC_pos 1296=for apidoc Amnh||PERL_MAGIC_symtab 1297=for apidoc Amnh||PERL_MAGIC_backref 1298=for apidoc Amnh||PERL_MAGIC_arylen_p 1299=for apidoc Amnh||PERL_MAGIC_bm 1300=for apidoc Amnh||PERL_MAGIC_overload_table 1301=for apidoc Amnh||PERL_MAGIC_regdata 1302=for apidoc Amnh||PERL_MAGIC_regdatum 1303=for apidoc Amnh||PERL_MAGIC_env 1304=for apidoc Amnh||PERL_MAGIC_envelem 1305=for apidoc Amnh||PERL_MAGIC_fm 1306=for apidoc Amnh||PERL_MAGIC_regex_global 1307=for apidoc Amnh||PERL_MAGIC_hints 1308=for apidoc Amnh||PERL_MAGIC_hintselem 1309=for apidoc Amnh||PERL_MAGIC_isa 1310=for apidoc Amnh||PERL_MAGIC_isaelem 1311=for apidoc Amnh||PERL_MAGIC_nkeys 1312=for apidoc Amnh||PERL_MAGIC_dbfile 1313=for apidoc Amnh||PERL_MAGIC_dbline 1314=for apidoc Amnh||PERL_MAGIC_shared 1315=for apidoc Amnh||PERL_MAGIC_shared_scalar 1316=for apidoc Amnh||PERL_MAGIC_collxfrm 1317=for apidoc Amnh||PERL_MAGIC_tied 1318=for apidoc Amnh||PERL_MAGIC_tiedelem 1319=for apidoc Amnh||PERL_MAGIC_tiedscalar 1320=for apidoc Amnh||PERL_MAGIC_qr 1321=for apidoc Amnh||PERL_MAGIC_sig 1322=for apidoc Amnh||PERL_MAGIC_sigelem 1323=for apidoc Amnh||PERL_MAGIC_taint 1324=for apidoc Amnh||PERL_MAGIC_uvar 1325=for apidoc Amnh||PERL_MAGIC_uvar_elem 1326=for apidoc Amnh||PERL_MAGIC_vstring 1327=for apidoc Amnh||PERL_MAGIC_vec 1328=for apidoc Amnh||PERL_MAGIC_utf8 1329=for apidoc Amnh||PERL_MAGIC_substr 1330=for apidoc Amnh||PERL_MAGIC_nonelem 1331=for apidoc Amnh||PERL_MAGIC_defelem 1332=for apidoc Amnh||PERL_MAGIC_lvref 1333=for apidoc Amnh||PERL_MAGIC_checkcall 1334=for apidoc Amnh||PERL_MAGIC_ext 1335 1336=for mg_vtable.pl end 1337 1338When an uppercase and lowercase letter both exist in the table, then the 1339uppercase letter is typically used to represent some kind of composite type 1340(a list or a hash), and the lowercase letter is used to represent an element 1341of that composite type. Some internals code makes use of this case 1342relationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1343 1344The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined 1345specifically for use by extensions and will not be used by perl itself. 1346Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information 1347to variables (typically objects). This is especially useful because 1348there is no way for normal perl code to corrupt this private information 1349(unlike using extra elements of a hash object). 1350 1351Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1352C function any time a scalar's value is used or changed. The C<MAGIC>'s 1353C<mg_ptr> field points to a C<ufuncs> structure: 1354 1355 struct ufuncs { 1356 I32 (*uf_val)(pTHX_ IV, SV*); 1357 I32 (*uf_set)(pTHX_ IV, SV*); 1358 IV uf_index; 1359 }; 1360 1361When the SV is read from or written to, the C<uf_val> or C<uf_set> 1362function will be called with C<uf_index> as the first arg and a pointer to 1363the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1364magic is shown below. Note that the ufuncs structure is copied by 1365sv_magic, so you can safely allocate it on the stack. 1366 1367 void 1368 Umagic(sv) 1369 SV *sv; 1370 PREINIT: 1371 struct ufuncs uf; 1372 CODE: 1373 uf.uf_val = &my_get_fn; 1374 uf.uf_set = &my_set_fn; 1375 uf.uf_index = 0; 1376 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1377 1378Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. 1379 1380For hashes there is a specialized hook that gives control over hash 1381keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic 1382if the "set" function in the C<ufuncs> structure is NULL. The hook 1383is activated whenever the hash is accessed with a key specified as 1384an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, 1385C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string 1386through the functions without the C<..._ent> suffix circumvents the 1387hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. 1388 1389Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1390or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1391extra care to avoid conflict. Typically only using the magic on 1392objects blessed into the same class as the extension is sufficient. 1393For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an 1394C<MGVTBL>, even if all its fields will be C<0>, so that individual 1395C<MAGIC> pointers can be identified as a particular kind of magic 1396using their magic virtual table. C<mg_findext> provides an easy way 1397to do that: 1398 1399 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; 1400 1401 MAGIC *mg; 1402 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { 1403 /* this is really ours, not another module's PERL_MAGIC_ext */ 1404 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; 1405 ... 1406 } 1407 1408Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1409earlier do B<not> invoke 'set' magic on their targets. This must 1410be done by the user either by calling the C<SvSETMAGIC()> macro after 1411calling these functions, or by using one of the C<sv_set*_mg()> or 1412C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1413C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1414obtained from external sources in functions that don't handle magic. 1415See L<perlapi> for a description of these functions. 1416For example, calls to the C<sv_cat*()> functions typically need to be 1417followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1418since their implementation handles 'get' magic. 1419 1420=head2 Finding Magic 1421 1422 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that 1423 * type */ 1424 1425This routine returns a pointer to a C<MAGIC> structure stored in the SV. 1426If the SV does not have that magical 1427feature, C<NULL> is returned. If the 1428SV has multiple instances of that magical feature, the first one will be 1429returned. C<mg_findext> can be used 1430to find a C<MAGIC> structure of an SV 1431based on both its magic type and its magic virtual table: 1432 1433 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); 1434 1435Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type 1436SVt_PVMG, Perl may core dump. 1437 1438 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1439 1440This routine checks to see what types of magic C<sv> has. If the mg_type 1441field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1442the mg_type field is changed to be the lowercase letter. 1443 1444=head2 Understanding the Magic of Tied Hashes and Arrays 1445 1446Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1447magic type. 1448 1449WARNING: As of the 5.004 release, proper usage of the array and hash 1450access functions requires understanding a few caveats. Some 1451of these caveats are actually considered bugs in the API, to be fixed 1452in later releases, and are bracketed with [MAYCHANGE] below. If 1453you find yourself actually applying such information in this section, be 1454aware that the behavior may change in the future, umm, without warning. 1455 1456The perl tie function associates a variable with an object that implements 1457the various GET, SET, etc methods. To perform the equivalent of the perl 1458tie function from an XSUB, you must mimic this behaviour. The code below 1459carries out the necessary steps -- firstly it creates a new hash, and then 1460creates a second hash which it blesses into the class which will implement 1461the tie methods. Lastly it ties the two hashes together, and returns a 1462reference to the new tied hash. Note that the code below does NOT call the 1463TIEHASH method in the MyTie class - 1464see L</Calling Perl Routines from within C Programs> for details on how 1465to do this. 1466 1467 SV* 1468 mytie() 1469 PREINIT: 1470 HV *hash; 1471 HV *stash; 1472 SV *tie; 1473 CODE: 1474 hash = newHV(); 1475 tie = newRV_noinc((SV*)newHV()); 1476 stash = gv_stashpv("MyTie", GV_ADD); 1477 sv_bless(tie, stash); 1478 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1479 RETVAL = newRV_noinc(hash); 1480 OUTPUT: 1481 RETVAL 1482 1483The C<av_store> function, when given a tied array argument, merely 1484copies the magic of the array onto the value to be "stored", using 1485C<mg_copy>. It may also return NULL, indicating that the value did not 1486actually need to be stored in the array. [MAYCHANGE] After a call to 1487C<av_store> on a tied array, the caller will usually need to call 1488C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1489TIEARRAY object. If C<av_store> did return NULL, a call to 1490C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1491leak. [/MAYCHANGE] 1492 1493The previous paragraph is applicable verbatim to tied hash access using the 1494C<hv_store> and C<hv_store_ent> functions as well. 1495 1496C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1497C<hv_fetch_ent> actually return an undefined mortal value whose magic 1498has been initialized using C<mg_copy>. Note the value so returned does not 1499need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1500need to call C<mg_get()> on the returned value in order to actually invoke 1501the perl level "FETCH" method on the underlying TIE object. Similarly, 1502you may also call C<mg_set()> on the return value after possibly assigning 1503a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1504method on the TIE object. [/MAYCHANGE] 1505 1506[MAYCHANGE] 1507In other words, the array or hash fetch/store functions don't really 1508fetch and store actual values in the case of tied arrays and hashes. They 1509merely call C<mg_copy> to attach magic to the values that were meant to be 1510"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1511do the job of invoking the TIE methods on the underlying objects. Thus 1512the magic mechanism currently implements a kind of lazy access to arrays 1513and hashes. 1514 1515Currently (as of perl version 5.004), use of the hash and array access 1516functions requires the user to be aware of whether they are operating on 1517"normal" hashes and arrays, or on their tied variants. The API may be 1518changed to provide more transparent access to both tied and normal data 1519types in future versions. 1520[/MAYCHANGE] 1521 1522You would do well to understand that the TIEARRAY and TIEHASH interfaces 1523are mere sugar to invoke some perl method calls while using the uniform hash 1524and array syntax. The use of this sugar imposes some overhead (typically 1525about two to four extra opcodes per FETCH/STORE operation, in addition to 1526the creation of all the mortal variables required to invoke the methods). 1527This overhead will be comparatively small if the TIE methods are themselves 1528substantial, but if they are only a few statements long, the overhead 1529will not be insignificant. 1530 1531=head2 Localizing changes 1532 1533Perl has a very handy construction 1534 1535 { 1536 local $var = 2; 1537 ... 1538 } 1539 1540This construction is I<approximately> equivalent to 1541 1542 { 1543 my $oldvar = $var; 1544 $var = 2; 1545 ... 1546 $var = $oldvar; 1547 } 1548 1549The biggest difference is that the first construction would 1550reinstate the initial value of $var, irrespective of how control exits 1551the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1552more efficient as well. 1553 1554There is a way to achieve a similar task from C via Perl API: create a 1555I<pseudo-block>, and arrange for some changes to be automatically 1556undone at the end of it, either explicit, or via a non-local exit (via 1557die()). A I<block>-like construct is created by a pair of 1558C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1559Such a construct may be created specially for some important localized 1560task, or an existing one (like boundaries of enclosing Perl 1561subroutine/block, or an existing pair for freeing TMPs) may be 1562used. (In the second case the overhead of additional localization must 1563be almost negligible.) Note that any XSUB is automatically enclosed in 1564an C<ENTER>/C<LEAVE> pair. 1565 1566Inside such a I<pseudo-block> the following service is available: 1567 1568=over 4 1569 1570=item C<SAVEINT(int i)> 1571 1572=item C<SAVEIV(IV i)> 1573 1574=item C<SAVEI32(I32 i)> 1575 1576=item C<SAVELONG(long i)> 1577 1578These macros arrange things to restore the value of integer variable 1579C<i> at the end of enclosing I<pseudo-block>. 1580 1581=item C<SAVESPTR(s)> 1582 1583=item C<SAVEPPTR(p)> 1584 1585These macros arrange things to restore the value of pointers C<s> and 1586C<p>. C<s> must be a pointer of a type which survives conversion to 1587C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1588and back. 1589 1590=item C<SAVEFREESV(SV *sv)> 1591 1592The refcount of C<sv> will be decremented at the end of 1593I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1594mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1595extends the lifetime of C<sv> until the beginning of the next statement, 1596C<SAVEFREESV> extends it until the end of the enclosing scope. These 1597lifetimes can be wildly different. 1598 1599Also compare C<SAVEMORTALIZESV>. 1600 1601=item C<SAVEMORTALIZESV(SV *sv)> 1602 1603Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1604scope instead of decrementing its reference count. This usually has the 1605effect of keeping C<sv> alive until the statement that called the currently 1606live scope has finished executing. 1607 1608=item C<SAVEFREEOP(OP *op)> 1609 1610The C<OP *> is op_free()ed at the end of I<pseudo-block>. 1611 1612=item C<SAVEFREEPV(p)> 1613 1614The chunk of memory which is pointed to by C<p> is Safefree()ed at the 1615end of I<pseudo-block>. 1616 1617=item C<SAVECLEARSV(SV *sv)> 1618 1619Clears a slot in the current scratchpad which corresponds to C<sv> at 1620the end of I<pseudo-block>. 1621 1622=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1623 1624The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1625string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1626short-lived storage, the corresponding string may be reallocated like 1627this: 1628 1629 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1630 1631=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1632 1633At the end of I<pseudo-block> the function C<f> is called with the 1634only argument C<p>. 1635 1636=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1637 1638At the end of I<pseudo-block> the function C<f> is called with the 1639implicit context argument (if any), and C<p>. 1640 1641=item C<SAVESTACK_POS()> 1642 1643The current offset on the Perl internal stack (cf. C<SP>) is restored 1644at the end of I<pseudo-block>. 1645 1646=back 1647 1648The following API list contains functions, thus one needs to 1649provide pointers to the modifiable data explicitly (either C pointers, 1650or Perlish C<GV *>s). Where the above macros take C<int>, a similar 1651function takes C<int *>. 1652 1653=over 4 1654 1655=item C<SV* save_scalar(GV *gv)> 1656 1657=for apidoc save_scalar 1658 1659Equivalent to Perl code C<local $gv>. 1660 1661=item C<AV* save_ary(GV *gv)> 1662 1663=for apidoc save_ary 1664 1665=item C<HV* save_hash(GV *gv)> 1666 1667=for apidoc save_hash 1668 1669Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 1670 1671=item C<void save_item(SV *item)> 1672 1673=for apidoc save_item 1674 1675Duplicates the current value of C<SV>. On the exit from the current 1676C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored 1677using the stored value. It doesn't handle magic. Use C<save_scalar> if 1678magic is affected. 1679 1680=item C<void save_list(SV **sarg, I32 maxsarg)> 1681 1682=for apidoc save_list 1683 1684A variant of C<save_item> which takes multiple arguments via an array 1685C<sarg> of C<SV*> of length C<maxsarg>. 1686 1687=item C<SV* save_svref(SV **sptr)> 1688 1689=for apidoc save_svref 1690 1691Similar to C<save_scalar>, but will reinstate an C<SV *>. 1692 1693=item C<void save_aptr(AV **aptr)> 1694 1695=item C<void save_hptr(HV **hptr)> 1696 1697=for apidoc save_aptr 1698=for apidoc save_hptr 1699 1700Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 1701 1702=back 1703 1704The C<Alias> module implements localization of the basic types within the 1705I<caller's scope>. People who are interested in how to localize things in 1706the containing scope should take a look there too. 1707 1708=head1 Subroutines 1709 1710=head2 XSUBs and the Argument Stack 1711 1712The XSUB mechanism is a simple way for Perl programs to access C subroutines. 1713An XSUB routine will have a stack that contains the arguments from the Perl 1714program, and a way to map from the Perl data structures to a C equivalent. 1715 1716The stack arguments are accessible through the C<ST(n)> macro, which returns 1717the C<n>'th stack argument. Argument 0 is the first argument passed in the 1718Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 1719an C<SV*> is used. 1720 1721Most of the time, output from the C routine can be handled through use of 1722the RETVAL and OUTPUT directives. However, there are some cases where the 1723argument stack is not already long enough to handle all the return values. 1724An example is the POSIX tzname() call, which takes no arguments, but returns 1725two, the local time zone's standard and summer time abbreviations. 1726 1727To handle this situation, the PPCODE directive is used and the stack is 1728extended using the macro: 1729 1730 EXTEND(SP, num); 1731 1732where C<SP> is the macro that represents the local copy of the stack pointer, 1733and C<num> is the number of elements the stack should be extended by. 1734 1735Now that there is room on the stack, values can be pushed on it using C<PUSHs> 1736macro. The pushed values will often need to be "mortal" (See 1737L</Reference Counts and Mortality>): 1738 1739 PUSHs(sv_2mortal(newSViv(an_integer))) 1740 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) 1741 PUSHs(sv_2mortal(newSVnv(a_double))) 1742 PUSHs(sv_2mortal(newSVpv("Some String",0))) 1743 /* Although the last example is better written as the more 1744 * efficient: */ 1745 PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) 1746 1747And now the Perl program calling C<tzname>, the two values will be assigned 1748as in: 1749 1750 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 1751 1752An alternate (and possibly simpler) method to pushing values on the stack is 1753to use the macro: 1754 1755 XPUSHs(SV*) 1756 1757This macro automatically adjusts the stack for you, if needed. Thus, you 1758do not need to call C<EXTEND> to extend the stack. 1759 1760Despite their suggestions in earlier versions of this document the macros 1761C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. 1762For that, either stick to the C<(X)PUSHs> macros shown above, or use the new 1763C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. 1764 1765For more information, consult L<perlxs> and L<perlxstut>. 1766 1767=head2 Autoloading with XSUBs 1768 1769If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the 1770fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable 1771of the XSUB's package. 1772 1773But it also puts the same information in certain fields of the XSUB itself: 1774 1775 HV *stash = CvSTASH(cv); 1776 const char *subname = SvPVX(cv); 1777 STRLEN name_length = SvCUR(cv); /* in bytes */ 1778 U32 is_utf8 = SvUTF8(cv); 1779 1780C<SvPVX(cv)> contains just the sub name itself, not including the package. 1781For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, 1782C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. 1783 1784B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support 1785XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the 1786XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need 1787to support 5.8-5.14, use the XSUB's fields. 1788 1789=head2 Calling Perl Routines from within C Programs 1790 1791There are four routines that can be used to call a Perl subroutine from 1792within a C program. These four are: 1793 1794 I32 call_sv(SV*, I32); 1795 I32 call_pv(const char*, I32); 1796 I32 call_method(const char*, I32); 1797 I32 call_argv(const char*, I32, char**); 1798 1799The routine most often used is C<call_sv>. The C<SV*> argument 1800contains either the name of the Perl subroutine to be called, or a 1801reference to the subroutine. The second argument consists of flags 1802that control the context in which the subroutine is called, whether 1803or not the subroutine is being passed arguments, how errors should be 1804trapped, and how to treat return values. 1805 1806All four routines return the number of arguments that the subroutine returned 1807on the Perl stack. 1808 1809These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 1810but those names are now deprecated; macros of the same name are provided for 1811compatibility. 1812 1813When using any of these routines (except C<call_argv>), the programmer 1814must manipulate the Perl stack. These include the following macros and 1815functions: 1816 1817 dSP 1818 SP 1819 PUSHMARK() 1820 PUTBACK 1821 SPAGAIN 1822 ENTER 1823 SAVETMPS 1824 FREETMPS 1825 LEAVE 1826 XPUSH*() 1827 POP*() 1828 1829For a detailed description of calling conventions from C to Perl, 1830consult L<perlcall>. 1831 1832=head2 Putting a C value on Perl stack 1833 1834A lot of opcodes (this is an elementary operation in the internal perl 1835stack machine) put an SV* on the stack. However, as an optimization 1836the corresponding SV is (usually) not recreated each time. The opcodes 1837reuse specially assigned SVs (I<target>s) which are (as a corollary) 1838not constantly freed/created. 1839 1840Each of the targets is created only once (but see 1841L</Scratchpads and recursion> below), and when an opcode needs to put 1842an integer, a double, or a string on stack, it just sets the 1843corresponding parts of its I<target> and puts the I<target> on stack. 1844 1845The macro to put this target on stack is C<PUSHTARG>, and it is 1846directly used in some opcodes, as well as indirectly in zillions of 1847others, which use it via C<(X)PUSH[iunp]>. 1848 1849Because the target is reused, you must be careful when pushing multiple 1850values on the stack. The following code will not do what you think: 1851 1852 XPUSHi(10); 1853 XPUSHi(20); 1854 1855This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 1856the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 1857At the end of the operation, the stack does not contain the values 10 1858and 20, but actually contains two pointers to C<TARG>, which we have set 1859to 20. 1860 1861If you need to push multiple different values then you should either use 1862the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, 1863none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an 1864SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, 1865will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make 1866this a little easier to achieve by creating a new mortal for you (via 1867C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary 1868in the case of the C<mXPUSH[iunp]> macros), and then setting its value. 1869Thus, instead of writing this to "fix" the example above: 1870 1871 XPUSHs(sv_2mortal(newSViv(10))) 1872 XPUSHs(sv_2mortal(newSViv(20))) 1873 1874you can simply write: 1875 1876 mXPUSHi(10) 1877 mXPUSHi(20) 1878 1879On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to 1880need a C<dTARG> in your variable declarations so that the C<*PUSH*> 1881macros can make use of the local variable C<TARG>. See also C<dTARGET> 1882and C<dXSTARG>. 1883 1884=head2 Scratchpads 1885 1886The question remains on when the SVs which are I<target>s for opcodes 1887are created. The answer is that they are created when the current 1888unit--a subroutine or a file (for opcodes for statements outside of 1889subroutines)--is compiled. During this time a special anonymous Perl 1890array is created, which is called a scratchpad for the current unit. 1891 1892A scratchpad keeps SVs which are lexicals for the current unit and are 1893targets for opcodes. A previous version of this document 1894stated that one can deduce that an SV lives on a scratchpad 1895by looking on its flags: lexicals have C<SVs_PADMY> set, and 1896I<target>s have C<SVs_PADTMP> set. But this has never been fully true. 1897C<SVs_PADMY> could be set on a variable that no longer resides in any pad. 1898While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables 1899that have never resided in a pad, but nonetheless act like I<target>s. As 1900of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as 19010. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. 1902 1903The correspondence between OPs and I<target>s is not 1-to-1. Different 1904OPs in the compile tree of the unit can use the same target, if this 1905would not conflict with the expected life of the temporary. 1906 1907=head2 Scratchpads and recursion 1908 1909In fact it is not 100% true that a compiled unit contains a pointer to 1910the scratchpad AV. In fact it contains a pointer to an AV of 1911(initially) one element, and this element is the scratchpad AV. Why do 1912we need an extra level of indirection? 1913 1914The answer is B<recursion>, and maybe B<threads>. Both 1915these can create several execution pointers going into the same 1916subroutine. For the subroutine-child not write over the temporaries 1917for the subroutine-parent (lifespan of which covers the call to the 1918child), the parent and the child should have different 1919scratchpads. (I<And> the lexicals should be separate anyway!) 1920 1921So each subroutine is born with an array of scratchpads (of length 1). 1922On each entry to the subroutine it is checked that the current 1923depth of the recursion is not more than the length of this array, and 1924if it is, new scratchpad is created and pushed into the array. 1925 1926The I<target>s on this scratchpad are C<undef>s, but they are already 1927marked with correct flags. 1928 1929=head1 Memory Allocation 1930 1931=head2 Allocation 1932 1933All memory meant to be used with the Perl API functions should be manipulated 1934using the macros described in this section. The macros provide the necessary 1935transparency between differences in the actual malloc implementation that is 1936used within perl. 1937 1938The following three macros are used to initially allocate memory : 1939 1940 Newx(pointer, number, type); 1941 Newxc(pointer, number, type, cast); 1942 Newxz(pointer, number, type); 1943 1944The first argument C<pointer> should be the name of a variable that will 1945point to the newly allocated memory. 1946 1947The second and third arguments C<number> and C<type> specify how many of 1948the specified type of data structure should be allocated. The argument 1949C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, 1950should be used if the C<pointer> argument is different from the C<type> 1951argument. 1952 1953Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> 1954to zero out all the newly allocated memory. 1955 1956=head2 Reallocation 1957 1958 Renew(pointer, number, type); 1959 Renewc(pointer, number, type, cast); 1960 Safefree(pointer) 1961 1962These three macros are used to change a memory buffer size or to free a 1963piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 1964match those of C<New> and C<Newc> with the exception of not needing the 1965"magic cookie" argument. 1966 1967=head2 Moving 1968 1969 Move(source, dest, number, type); 1970 Copy(source, dest, number, type); 1971 Zero(dest, number, type); 1972 1973These three macros are used to move, copy, or zero out previously allocated 1974memory. The C<source> and C<dest> arguments point to the source and 1975destination starting points. Perl will move, copy, or zero out C<number> 1976instances of the size of the C<type> data structure (using the C<sizeof> 1977function). 1978 1979=head1 PerlIO 1980 1981The most recent development releases of Perl have been experimenting with 1982removing Perl's dependency on the "normal" standard I/O suite and allowing 1983other stdio implementations to be used. This involves creating a new 1984abstraction layer that then calls whichever implementation of stdio Perl 1985was compiled with. All XSUBs should now use the functions in the PerlIO 1986abstraction layer and not make any assumptions about what kind of stdio 1987is being used. 1988 1989For a complete description of the PerlIO abstraction, consult L<perlapio>. 1990 1991=head1 Compiled code 1992 1993=head2 Code tree 1994 1995Here we describe the internal form your code is converted to by 1996Perl. Start with a simple example: 1997 1998 $a = $b + $c; 1999 2000This is converted to a tree similar to this one: 2001 2002 assign-to 2003 / \ 2004 + $a 2005 / \ 2006 $b $c 2007 2008(but slightly more complicated). This tree reflects the way Perl 2009parsed your code, but has nothing to do with the execution order. 2010There is an additional "thread" going through the nodes of the tree 2011which shows the order of execution of the nodes. In our simplified 2012example above it looks like: 2013 2014 $b ---> $c ---> + ---> $a ---> assign-to 2015 2016But with the actual compile tree for C<$a = $b + $c> it is different: 2017some nodes I<optimized away>. As a corollary, though the actual tree 2018contains more nodes than our simplified example, the execution order 2019is the same as in our example. 2020 2021=head2 Examining the tree 2022 2023If you have your perl compiled for debugging (usually done with 2024C<-DDEBUGGING> on the C<Configure> command line), you may examine the 2025compiled tree by specifying C<-Dx> on the Perl command line. The 2026output takes several lines per node, and for C<$b+$c> it looks like 2027this: 2028 2029 5 TYPE = add ===> 6 2030 TARG = 1 2031 FLAGS = (SCALAR,KIDS) 2032 { 2033 TYPE = null ===> (4) 2034 (was rv2sv) 2035 FLAGS = (SCALAR,KIDS) 2036 { 2037 3 TYPE = gvsv ===> 4 2038 FLAGS = (SCALAR) 2039 GV = main::b 2040 } 2041 } 2042 { 2043 TYPE = null ===> (5) 2044 (was rv2sv) 2045 FLAGS = (SCALAR,KIDS) 2046 { 2047 4 TYPE = gvsv ===> 5 2048 FLAGS = (SCALAR) 2049 GV = main::c 2050 } 2051 } 2052 2053This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 2054not optimized away (one per number in the left column). The immediate 2055children of the given node correspond to C<{}> pairs on the same level 2056of indentation, thus this listing corresponds to the tree: 2057 2058 add 2059 / \ 2060 null null 2061 | | 2062 gvsv gvsv 2063 2064The execution order is indicated by C<===E<gt>> marks, thus it is C<3 20654 5 6> (node C<6> is not included into above listing), i.e., 2066C<gvsv gvsv add whatever>. 2067 2068Each of these nodes represents an op, a fundamental operation inside the 2069Perl core. The code which implements each operation can be found in the 2070F<pp*.c> files; the function which implements the op with type C<gvsv> 2071is C<pp_gvsv>, and so on. As the tree above shows, different ops have 2072different numbers of children: C<add> is a binary operator, as one would 2073expect, and so has two children. To accommodate the various different 2074numbers of children, there are various types of op data structure, and 2075they link together in different ways. 2076 2077The simplest type of op structure is C<OP>: this has no children. Unary 2078operators, C<UNOP>s, have one child, and this is pointed to by the 2079C<op_first> field. Binary operators (C<BINOP>s) have not only an 2080C<op_first> field but also an C<op_last> field. The most complex type of 2081op is a C<LISTOP>, which has any number of children. In this case, the 2082first child is pointed to by C<op_first> and the last child by 2083C<op_last>. The children in between can be found by iteratively 2084following the C<OpSIBLING> pointer from the first child to the last (but 2085see below). 2086 2087There are also some other op types: a C<PMOP> holds a regular expression, 2088and has no children, and a C<LOOP> may or may not have children. If the 2089C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 2090complicate matters, if a C<UNOP> is actually a C<null> op after 2091optimization (see L</Compile pass 2: context propagation>) it will still 2092have children in accordance with its former type. 2093 2094Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one 2095or more children, but it doesn't have an C<op_last> field: so you have to 2096follow C<op_first> and then the C<OpSIBLING> chain itself to find the 2097last child. Instead it has an C<op_other> field, which is comparable to 2098the C<op_next> field described below, and represents an alternate 2099execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note 2100that in general, C<op_other> may not point to any of the direct children 2101of the C<LOGOP>. 2102 2103Starting in version 5.21.2, perls built with the experimental 2104define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, 2105C<op_moresib>. When not set, this indicates that this is the last op in an 2106C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last 2107sibling to point back to the parent op. Under this build, that field is 2108also renamed C<op_sibparent> to reflect its joint role. The macro 2109C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on 2110the last sibling. With this build the C<op_parent(o)> function can be 2111used to find the parent of any op. Thus for forward compatibility, you 2112should always use the C<OpSIBLING(o)> macro rather than accessing 2113C<op_sibling> directly. 2114 2115Another way to examine the tree is to use a compiler back-end module, such 2116as L<B::Concise>. 2117 2118=head2 Compile pass 1: check routines 2119 2120The tree is created by the compiler while I<yacc> code feeds it 2121the constructions it recognizes. Since I<yacc> works bottom-up, so does 2122the first pass of perl compilation. 2123 2124What makes this pass interesting for perl developers is that some 2125optimization may be performed on this pass. This is optimization by 2126so-called "check routines". The correspondence between node names 2127and corresponding check routines is described in F<opcode.pl> (do not 2128forget to run C<make regen_headers> if you modify this file). 2129 2130A check routine is called when the node is fully constructed except 2131for the execution-order thread. Since at this time there are no 2132back-links to the currently constructed node, one can do most any 2133operation to the top-level node, including freeing it and/or creating 2134new nodes above/below it. 2135 2136The check routine returns the node which should be inserted into the 2137tree (if the top-level node was not modified, check routine returns 2138its argument). 2139 2140By convention, check routines have names C<ck_*>. They are usually 2141called from C<new*OP> subroutines (or C<convert>) (which in turn are 2142called from F<perly.y>). 2143 2144=head2 Compile pass 1a: constant folding 2145 2146Immediately after the check routine is called the returned node is 2147checked for being compile-time executable. If it is (the value is 2148judged to be constant) it is immediately executed, and a I<constant> 2149node with the "return value" of the corresponding subtree is 2150substituted instead. The subtree is deleted. 2151 2152If constant folding was not performed, the execution-order thread is 2153created. 2154 2155=head2 Compile pass 2: context propagation 2156 2157When a context for a part of compile tree is known, it is propagated 2158down through the tree. At this time the context can have 5 values 2159(instead of 2 for runtime context): void, boolean, scalar, list, and 2160lvalue. In contrast with the pass 1 this pass is processed from top 2161to bottom: a node's context determines the context for its children. 2162 2163Additional context-dependent optimizations are performed at this time. 2164Since at this moment the compile tree contains back-references (via 2165"thread" pointers), nodes cannot be free()d now. To allow 2166optimized-away nodes at this stage, such nodes are null()ified instead 2167of free()ing (i.e. their type is changed to OP_NULL). 2168 2169=head2 Compile pass 3: peephole optimization 2170 2171After the compile tree for a subroutine (or for an C<eval> or a file) 2172is created, an additional pass over the code is performed. This pass 2173is neither top-down or bottom-up, but in the execution order (with 2174additional complications for conditionals). Optimizations performed 2175at this stage are subject to the same restrictions as in the pass 2. 2176 2177Peephole optimizations are done by calling the function pointed to 2178by the global variable C<PL_peepp>. By default, C<PL_peepp> just 2179calls the function pointed to by the global variable C<PL_rpeepp>. 2180By default, that performs some basic op fixups and optimisations along 2181the execution-order op chain, and recursively calls C<PL_rpeepp> for 2182each side chain of ops (resulting from conditionals). Extensions may 2183provide additional optimisations or fixups, hooking into either the 2184per-subroutine or recursive stage, like this: 2185 2186 static peep_t prev_peepp; 2187 static void my_peep(pTHX_ OP *o) 2188 { 2189 /* custom per-subroutine optimisation goes here */ 2190 prev_peepp(aTHX_ o); 2191 /* custom per-subroutine optimisation may also go here */ 2192 } 2193 BOOT: 2194 prev_peepp = PL_peepp; 2195 PL_peepp = my_peep; 2196 2197 static peep_t prev_rpeepp; 2198 static void my_rpeep(pTHX_ OP *o) 2199 { 2200 OP *orig_o = o; 2201 for(; o; o = o->op_next) { 2202 /* custom per-op optimisation goes here */ 2203 } 2204 prev_rpeepp(aTHX_ orig_o); 2205 } 2206 BOOT: 2207 prev_rpeepp = PL_rpeepp; 2208 PL_rpeepp = my_rpeep; 2209 2210=head2 Pluggable runops 2211 2212The compile tree is executed in a runops function. There are two runops 2213functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 2214with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 2215control over the execution of the compile tree it is possible to provide 2216your own runops function. 2217 2218It's probably best to copy one of the existing runops functions and 2219change it to suit your needs. Then, in the BOOT section of your XS 2220file, add the line: 2221 2222 PL_runops = my_runops; 2223 2224This function should be as efficient as possible to keep your programs 2225running as fast as possible. 2226 2227=head2 Compile-time scope hooks 2228 2229As of perl 5.14 it is possible to hook into the compile-time lexical 2230scope mechanism using C<Perl_blockhook_register>. This is used like 2231this: 2232 2233 STATIC void my_start_hook(pTHX_ int full); 2234 STATIC BHK my_hooks; 2235 2236 BOOT: 2237 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); 2238 Perl_blockhook_register(aTHX_ &my_hooks); 2239 2240This will arrange to have C<my_start_hook> called at the start of 2241compiling every lexical scope. The available hooks are: 2242 2243=over 4 2244 2245=item C<void bhk_start(pTHX_ int full)> 2246 2247This is called just after starting a new lexical scope. Note that Perl 2248code like 2249 2250 if ($x) { ... } 2251 2252creates two scopes: the first starts at the C<(> and has C<full == 1>, 2253the second starts at the C<{> and has C<full == 0>. Both end at the 2254C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything 2255pushed onto the save stack by this hook will be popped just before the 2256scope ends (between the C<pre_> and C<post_end> hooks, in fact). 2257 2258=item C<void bhk_pre_end(pTHX_ OP **o)> 2259 2260This is called at the end of a lexical scope, just before unwinding the 2261stack. I<o> is the root of the optree representing the scope; it is a 2262double pointer so you can replace the OP if you need to. 2263 2264=item C<void bhk_post_end(pTHX_ OP **o)> 2265 2266This is called at the end of a lexical scope, just after unwinding the 2267stack. I<o> is as above. Note that it is possible for calls to C<pre_> 2268and C<post_end> to nest, if there is something on the save stack that 2269calls string eval. 2270 2271=item C<void bhk_eval(pTHX_ OP *const o)> 2272 2273This is called just before starting to compile an C<eval STRING>, C<do 2274FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the 2275OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, 2276C<OP_DOFILE> or C<OP_REQUIRE>. 2277 2278=back 2279 2280Once you have your hook functions, you need a C<BHK> structure to put 2281them in. It's best to allocate it statically, since there is no way to 2282free it once it's registered. The function pointers should be inserted 2283into this structure using the C<BhkENTRY_set> macro, which will also set 2284flags indicating which entries are valid. If you do need to allocate 2285your C<BHK> dynamically for some reason, be sure to zero it before you 2286start. 2287 2288Once registered, there is no mechanism to switch these hooks off, so if 2289that is necessary you will need to do this yourself. An entry in C<%^H> 2290is probably the best way, so the effect is lexically scoped; however it 2291is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to 2292temporarily switch entries on and off. You should also be aware that 2293generally speaking at least one scope will have opened before your 2294extension is loaded, so you will see some C<pre>/C<post_end> pairs that 2295didn't have a matching C<start>. 2296 2297=head1 Examining internal data structures with the C<dump> functions 2298 2299To aid debugging, the source file F<dump.c> contains a number of 2300functions which produce formatted output of internal data structures. 2301 2302The most commonly used of these functions is C<Perl_sv_dump>; it's used 2303for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 2304C<sv_dump> to produce debugging output from Perl-space, so users of that 2305module should already be familiar with its format. 2306 2307C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 2308derivatives, and produces output similar to C<perl -Dx>; in fact, 2309C<Perl_dump_eval> will dump the main root of the code being evaluated, 2310exactly like C<-Dx>. 2311 2312Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 2313op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 2314subroutines in a package like so: (Thankfully, these are all xsubs, so 2315there is no op tree) 2316 2317 (gdb) print Perl_dump_packsubs(PL_defstash) 2318 2319 SUB attributes::bootstrap = (xsub 0x811fedc 0) 2320 2321 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 2322 2323 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 2324 2325 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 2326 2327 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 2328 2329and C<Perl_dump_all>, which dumps all the subroutines in the stash and 2330the op tree of the main root. 2331 2332=head1 How multiple interpreters and concurrency are supported 2333 2334=head2 Background and PERL_IMPLICIT_CONTEXT 2335 2336The Perl interpreter can be regarded as a closed box: it has an API 2337for feeding it code or otherwise making it do things, but it also has 2338functions for its own use. This smells a lot like an object, and 2339there are ways for you to build Perl so that you can have multiple 2340interpreters, with one interpreter represented either as a C structure, 2341or inside a thread-specific structure. These structures contain all 2342the context, the state of that interpreter. 2343 2344One macro controls the major Perl build flavor: MULTIPLICITY. The 2345MULTIPLICITY build has a C structure that packages all the interpreter 2346state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also 2347normally defined, and enables the support for passing in a "hidden" first 2348argument that represents all three data structures. MULTIPLICITY makes 2349multi-threaded perls possible (with the ithreads threading model, related 2350to the macro USE_ITHREADS.) 2351 2352Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and 2353PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the 2354former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the 2355internal variables of Perl to be wrapped inside a single global struct, 2356struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or 2357the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes 2358one step further, there is still a single struct (allocated in main() 2359either from heap or from stack) but there are no global data symbols 2360pointing to it. In either case the global struct should be initialized 2361as the very first thing in main() using Perl_init_global_struct() and 2362correspondingly tear it down after perl_free() using Perl_free_global_struct(), 2363please see F<miniperlmain.c> for usage details. You may also need 2364to use C<dVAR> in your coding to "declare the global variables" 2365when you are using them. dTHX does this for you automatically. 2366 2367=for apidoc Amnh||dVAR 2368 2369To see whether you have non-const data you can use a BSD (or GNU) 2370compatible C<nm>: 2371 2372 nm libperl.a | grep -v ' [TURtr] ' 2373 2374If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), 2375you have non-const data. The symbols the C<grep> removed are as follows: 2376C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, 2377and the C<U> is <undefined>, external symbols referred to. 2378 2379The test F<t/porting/libperl.t> does this kind of symbol sanity 2380checking on C<libperl.a>. 2381 2382For backward compatibility reasons defining just PERL_GLOBAL_STRUCT 2383doesn't actually hide all symbols inside a big global struct: some 2384PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE 2385then hides everything (see how the PERLIO_FUNCS_DECL is used). 2386 2387All this obviously requires a way for the Perl internal functions to be 2388either subroutines taking some kind of structure as the first 2389argument, or subroutines taking nothing as the first argument. To 2390enable these two very different ways of building the interpreter, 2391the Perl source (as it does in so many other situations) makes heavy 2392use of macros and subroutine naming conventions. 2393 2394First problem: deciding which functions will be public API functions and 2395which will be private. All functions whose names begin C<S_> are private 2396(think "S" for "secret" or "static"). All other functions begin with 2397"Perl_", but just because a function begins with "Perl_" does not mean it is 2398part of the API. (See L</Internal 2399Functions>.) The easiest way to be B<sure> a 2400function is part of the API is to find its entry in L<perlapi>. 2401If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 2402think it should be (i.e., you need it for your extension), submit an issue at 2403L<https://github.com/Perl/perl5/issues> explaining why you think it should be. 2404 2405Second problem: there must be a syntax so that the same subroutine 2406declarations and calls can pass a structure as their first argument, 2407or pass nothing. To solve this, the subroutines are named and 2408declared in a particular way. Here's a typical start of a static 2409function used within the Perl guts: 2410 2411 STATIC void 2412 S_incline(pTHX_ char *s) 2413 2414STATIC becomes "static" in C, and may be #define'd to nothing in some 2415configurations in the future. 2416 2417A public function (i.e. part of the internal API, but not necessarily 2418sanctioned for use in extensions) begins like this: 2419 2420 void 2421 Perl_sv_setiv(pTHX_ SV* dsv, IV num) 2422 2423C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the 2424details of the interpreter's context. THX stands for "thread", "this", 2425or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 2426The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 2427or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 2428their variants. 2429 2430=for apidoc Amnh||aTHX 2431=for apidoc Amnh||aTHX_ 2432=for apidoc Amnh||dTHX 2433=for apidoc Amnh||pTHX 2434=for apidoc Amnh||pTHX_ 2435 2436When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no 2437first argument containing the interpreter's context. The trailing underscore 2438in the pTHX_ macro indicates that the macro expansion needs a comma 2439after the context argument because other arguments follow it. If 2440PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the 2441subroutine is not prototyped to take the extra argument. The form of the 2442macro without the trailing underscore is used when there are no additional 2443explicit arguments. 2444 2445When a core function calls another, it must pass the context. This 2446is normally hidden via macros. Consider C<sv_setiv>. It expands into 2447something like this: 2448 2449 #ifdef PERL_IMPLICIT_CONTEXT 2450 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 2451 /* can't do this for vararg functions, see below */ 2452 #else 2453 #define sv_setiv Perl_sv_setiv 2454 #endif 2455 2456This works well, and means that XS authors can gleefully write: 2457 2458 sv_setiv(foo, bar); 2459 2460and still have it work under all the modes Perl could have been 2461compiled with. 2462 2463This doesn't work so cleanly for varargs functions, though, as macros 2464imply that the number of arguments is known in advance. Instead we 2465either need to spell them out fully, passing C<aTHX_> as the first 2466argument (the Perl core tends to do this with functions like 2467Perl_warner), or use a context-free version. 2468 2469The context-free version of Perl_warner is called 2470Perl_warner_nocontext, and does not take the extra argument. Instead 2471it does C<dTHX;> to get the context from thread-local storage. We 2472C<#define warner Perl_warner_nocontext> so that extensions get source 2473compatibility at the expense of performance. (Passing an arg is 2474cheaper than grabbing it from thread-local storage.) 2475 2476You can ignore [pad]THXx when browsing the Perl headers/sources. 2477Those are strictly for use within the core. Extensions and embedders 2478need only be aware of [pad]THX. 2479 2480=head2 So what happened to dTHR? 2481 2482=for apidoc Amnh||dTHR 2483 2484C<dTHR> was introduced in perl 5.005 to support the older thread model. 2485The older thread model now uses the C<THX> mechanism to pass context 2486pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 2487later still have it for backward source compatibility, but it is defined 2488to be a no-op. 2489 2490=head2 How do I use all this in extensions? 2491 2492When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call 2493any functions in the Perl API will need to pass the initial context 2494argument somehow. The kicker is that you will need to write it in 2495such a way that the extension still compiles when Perl hasn't been 2496built with PERL_IMPLICIT_CONTEXT enabled. 2497 2498There are three ways to do this. First, the easy but inefficient way, 2499which is also the default, in order to maintain source compatibility 2500with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX 2501and aTHX_ macros to call a function that will return the context. 2502Thus, something like: 2503 2504 sv_setiv(sv, num); 2505 2506in your extension will translate to this when PERL_IMPLICIT_CONTEXT is 2507in effect: 2508 2509 Perl_sv_setiv(Perl_get_context(), sv, num); 2510 2511or to this otherwise: 2512 2513 Perl_sv_setiv(sv, num); 2514 2515You don't have to do anything new in your extension to get this; since 2516the Perl library provides Perl_get_context(), it will all just 2517work. 2518 2519The second, more efficient way is to use the following template for 2520your Foo.xs: 2521 2522 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2523 #include "EXTERN.h" 2524 #include "perl.h" 2525 #include "XSUB.h" 2526 2527 STATIC void my_private_function(int arg1, int arg2); 2528 2529 STATIC void 2530 my_private_function(int arg1, int arg2) 2531 { 2532 dTHX; /* fetch context */ 2533 ... call many Perl API functions ... 2534 } 2535 2536 [... etc ...] 2537 2538 MODULE = Foo PACKAGE = Foo 2539 2540 /* typical XSUB */ 2541 2542 void 2543 my_xsub(arg) 2544 int arg 2545 CODE: 2546 my_private_function(arg, 10); 2547 2548Note that the only two changes from the normal way of writing an 2549extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 2550including the Perl headers, followed by a C<dTHX;> declaration at 2551the start of every function that will call the Perl API. (You'll 2552know which functions need this, because the C compiler will complain 2553that there's an undeclared identifier in those functions.) No changes 2554are needed for the XSUBs themselves, because the XS() macro is 2555correctly defined to pass in the implicit context if needed. 2556 2557The third, even more efficient way is to ape how it is done within 2558the Perl guts: 2559 2560 2561 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2562 #include "EXTERN.h" 2563 #include "perl.h" 2564 #include "XSUB.h" 2565 2566 /* pTHX_ only needed for functions that call Perl API */ 2567 STATIC void my_private_function(pTHX_ int arg1, int arg2); 2568 2569 STATIC void 2570 my_private_function(pTHX_ int arg1, int arg2) 2571 { 2572 /* dTHX; not needed here, because THX is an argument */ 2573 ... call Perl API functions ... 2574 } 2575 2576 [... etc ...] 2577 2578 MODULE = Foo PACKAGE = Foo 2579 2580 /* typical XSUB */ 2581 2582 void 2583 my_xsub(arg) 2584 int arg 2585 CODE: 2586 my_private_function(aTHX_ arg, 10); 2587 2588This implementation never has to fetch the context using a function 2589call, since it is always passed as an extra argument. Depending on 2590your needs for simplicity or efficiency, you may mix the previous 2591two approaches freely. 2592 2593Never add a comma after C<pTHX> yourself--always use the form of the 2594macro with the underscore for functions that take explicit arguments, 2595or the form without the argument for functions with no explicit arguments. 2596 2597If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR> 2598definition is needed if the Perl global variables (see F<perlvars.h> 2599or F<globvar.sym>) are accessed in the function and C<dTHX> is not 2600used (the C<dTHX> includes the C<dVAR> if necessary). One notices 2601the need for C<dVAR> only with the said compile-time define, because 2602otherwise the Perl global variables are visible as-is. 2603 2604=head2 Should I do anything special if I call perl from multiple threads? 2605 2606If you create interpreters in one thread and then proceed to call them in 2607another, you need to make sure perl's own Thread Local Storage (TLS) slot is 2608initialized correctly in each of those threads. 2609 2610The C<perl_alloc> and C<perl_clone> API functions will automatically set 2611the TLS slot to the interpreter they created, so that there is no need to do 2612anything special if the interpreter is always accessed in the same thread that 2613created it, and that thread did not create or call any other interpreters 2614afterwards. If that is not the case, you have to set the TLS slot of the 2615thread before calling any functions in the Perl API on that particular 2616interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 2617thread as the first thing you do: 2618 2619 /* do this before doing anything else with some_perl */ 2620 PERL_SET_CONTEXT(some_perl); 2621 2622 ... other Perl API calls on some_perl go here ... 2623 2624=head2 Future Plans and PERL_IMPLICIT_SYS 2625 2626Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything 2627that the interpreter knows about itself and pass it around, so too are 2628there plans to allow the interpreter to bundle up everything it knows 2629about the environment it's running on. This is enabled with the 2630PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on 2631Windows. 2632 2633This allows the ability to provide an extra pointer (called the "host" 2634environment) for all the system calls. This makes it possible for 2635all the system stuff to maintain their own state, broken down into 2636seven C structures. These are thin wrappers around the usual system 2637calls (see F<win32/perllib.c>) for the default perl executable, but for a 2638more ambitious host (like the one that would do fork() emulation) all 2639the extra work needed to pretend that different interpreters are 2640actually different "processes", would be done here. 2641 2642The Perl engine/interpreter and the host are orthogonal entities. 2643There could be one or more interpreters in a process, and one or 2644more "hosts", with free association between them. 2645 2646=head1 Internal Functions 2647 2648All of Perl's internal functions which will be exposed to the outside 2649world are prefixed by C<Perl_> so that they will not conflict with XS 2650functions or functions used in a program in which Perl is embedded. 2651Similarly, all global variables begin with C<PL_>. (By convention, 2652static functions start with C<S_>.) 2653 2654Inside the Perl core (C<PERL_CORE> defined), you can get at the functions 2655either with or without the C<Perl_> prefix, thanks to a bunch of defines 2656that live in F<embed.h>. Note that extension code should I<not> set 2657C<PERL_CORE>; this exposes the full perl internals, and is likely to cause 2658breakage of the XS in each new perl release. 2659 2660The file F<embed.h> is generated automatically from 2661F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 2662header files for the internal functions, generates the documentation 2663and a lot of other bits and pieces. It's important that when you add 2664a new function to the core or change an existing one, you change the 2665data in the table in F<embed.fnc> as well. Here's a sample entry from 2666that table: 2667 2668 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 2669 2670The first column is a set of flags, the second column the return type, 2671the third column the name. Columns after that are the arguments. 2672The flags are documented at the top of F<embed.fnc>. 2673 2674If you edit F<embed.pl> or F<embed.fnc>, you will need to run 2675C<make regen_headers> to force a rebuild of F<embed.h> and other 2676auto-generated files. 2677 2678=head2 Formatted Printing of IVs, UVs, and NVs 2679 2680If you are printing IVs, UVs, or NVS instead of the stdio(3) style 2681formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 2682following macros for portability 2683 2684 IVdf IV in decimal 2685 UVuf UV in decimal 2686 UVof UV in octal 2687 UVxf UV in hexadecimal 2688 NVef NV %e-like 2689 NVff NV %f-like 2690 NVgf NV %g-like 2691 2692=for apidoc Amnh||IVdf 2693=for apidoc Amnh||UVuf 2694=for apidoc Amnh||UVof 2695=for apidoc Amnh||UVxf 2696=for apidoc Amnh||NVef 2697=for apidoc Amnh||NVff 2698=for apidoc Amnh||NVgf 2699 2700These will take care of 64-bit integers and long doubles. 2701For example: 2702 2703 printf("IV is %" IVdf "\n", iv); 2704 2705The C<IVdf> will expand to whatever is the correct format for the IVs. 2706Note that the spaces are required around the format in case the code is 2707compiled with C++, to maintain compliance with its standard. 2708 2709Note that there are different "long doubles": Perl will use 2710whatever the compiler has. 2711 2712If you are printing addresses of pointers, use %p or UVxf combined 2713with PTR2UV(). 2714 2715=head2 Formatted Printing of SVs 2716 2717The contents of SVs may be printed using the C<SVf> format, like so: 2718 2719 Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SvfARG(err_msg)) 2720 2721where C<err_msg> is an SV. 2722 2723=for apidoc Amnh||SVf 2724=for apidoc Amh||SVfARG|SV *sv 2725 2726Not all scalar types are printable. Simple values certainly are: one of 2727IV, UV, NV, or PV. Also, if the SV is a reference to some value, 2728either it will be dereferenced and the value printed, or information 2729about the type of that value and its address are displayed. The results 2730of printing any other type of SV are undefined and likely to lead to an 2731interpreter crash. NVs are printed using a C<%g>-ish format. 2732 2733Note that the spaces are required around the C<SVf> in case the code is 2734compiled with C++, to maintain compliance with its standard. 2735 2736Note that any filehandle being printed to under UTF-8 must be expecting 2737UTF-8 in order to get good results and avoid Wide-character warnings. 2738One way to do this for typical filehandles is to invoke perl with the 2739C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>. 2740 2741You can use this to concatenate two scalars: 2742 2743 SV *var1 = get_sv("var1", GV_ADD); 2744 SV *var2 = get_sv("var2", GV_ADD); 2745 SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, 2746 SVfARG(var1), SVfARG(var2)); 2747 2748=head2 Formatted Printing of Strings 2749 2750If you just want the bytes printed in a 7bit NUL-terminated string, you can 2751just use C<%s> (assuming they are all really only 7bit). But if there is a 2752possibility the value will be encoded as UTF-8 or contains bytes above 2753C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. 2754And as its parameter, use the C<UTF8fARG()> macro: 2755 2756 chr * msg; 2757 2758 /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK 2759 U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ 2760 if (can_utf8) 2761 msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; 2762 else 2763 msg = "'Uses simple quotes'"; 2764 2765 Perl_croak(aTHX_ "The message is: %" UTF8f "\n", 2766 UTF8fARG(can_utf8, strlen(msg), msg)); 2767 2768The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in 2769UTF-8; 0 if string is in native byte encoding (Latin1). 2770The second parameter is the number of bytes in the string to print. 2771And the third and final parameter is a pointer to the first byte in the 2772string. 2773 2774Note that any filehandle being printed to under UTF-8 must be expecting 2775UTF-8 in order to get good results and avoid Wide-character warnings. 2776One way to do this for typical filehandles is to invoke perl with the 2777C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>. 2778 2779=head2 Formatted Printing of C<Size_t> and C<SSize_t> 2780 2781The most general way to do this is to cast them to a UV or IV, and 2782print as in the 2783L<previous section|/Formatted Printing of IVs, UVs, and NVs>. 2784 2785But if you're using C<PerlIO_printf()>, it's less typing and visual 2786clutter to use the C<%z> length modifier (for I<siZe>): 2787 2788 PerlIO_printf("STRLEN is %zu\n", len); 2789 2790This modifier is not portable, so its use should be restricted to 2791C<PerlIO_printf()>. 2792 2793=head2 Pointer-To-Integer and Integer-To-Pointer 2794 2795Because pointer size does not necessarily equal integer size, 2796use the follow macros to do it right. 2797 2798 PTR2UV(pointer) 2799 PTR2IV(pointer) 2800 PTR2NV(pointer) 2801 INT2PTR(pointertotype, integer) 2802 2803=for apidoc Amh|void *|INT2PTR|type|int value 2804=for apidoc Amh|UV|PTR2UV|void * 2805=for apidoc Amh|IV|PTR2IV|void * 2806=for apidoc Amh|NV|PTR2NV|void * 2807 2808For example: 2809 2810 IV iv = ...; 2811 SV *sv = INT2PTR(SV*, iv); 2812 2813and 2814 2815 AV *av = ...; 2816 UV uv = PTR2UV(av); 2817 2818=head2 Exception Handling 2819 2820There are a couple of macros to do very basic exception handling in XS 2821modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to 2822be able to use these macros: 2823 2824 #define NO_XSLOCKS 2825 #include "XSUB.h" 2826 2827You can use these macros if you call code that may croak, but you need 2828to do some cleanup before giving control back to Perl. For example: 2829 2830 dXCPT; /* set up necessary variables */ 2831 2832 XCPT_TRY_START { 2833 code_that_may_croak(); 2834 } XCPT_TRY_END 2835 2836 XCPT_CATCH 2837 { 2838 /* do cleanup here */ 2839 XCPT_RETHROW; 2840 } 2841 2842Note that you always have to rethrow an exception that has been 2843caught. Using these macros, it is not possible to just catch the 2844exception and ignore it. If you have to ignore the exception, you 2845have to use the C<call_*> function. 2846 2847The advantage of using the above macros is that you don't have 2848to setup an extra function for C<call_*>, and that using these 2849macros is faster than using C<call_*>. 2850 2851=head2 Source Documentation 2852 2853There's an effort going on to document the internal functions and 2854automatically produce reference manuals from them -- L<perlapi> is one 2855such manual which details all the functions which are available to XS 2856writers. L<perlintern> is the autogenerated manual for the functions 2857which are not part of the API and are supposedly for internal use only. 2858 2859=for comment 2860skip apidoc 2861The following is an example and shouldn't be read as a real apidoc line 2862 2863Source documentation is created by putting POD comments into the C 2864source, like this: 2865 2866 /* 2867 =for apidoc sv_setiv 2868 2869 Copies an integer into the given SV. Does not handle 'set' magic. See 2870 L<perlapi/sv_setiv_mg>. 2871 2872 =cut 2873 */ 2874 2875Please try and supply some documentation if you add functions to the 2876Perl core. 2877 2878=head2 Backwards compatibility 2879 2880The Perl API changes over time. New functions are 2881added or the interfaces of existing functions are 2882changed. The C<Devel::PPPort> module tries to 2883provide compatibility code for some of these changes, so XS writers don't 2884have to code it themselves when supporting multiple versions of Perl. 2885 2886C<Devel::PPPort> generates a C header file F<ppport.h> that can also 2887be run as a Perl script. To generate F<ppport.h>, run: 2888 2889 perl -MDevel::PPPort -eDevel::PPPort::WriteFile 2890 2891Besides checking existing XS code, the script can also be used to retrieve 2892compatibility information for various API calls using the C<--api-info> 2893command line switch. For example: 2894 2895 % perl ppport.h --api-info=sv_magicext 2896 2897For details, see C<perldoc ppport.h>. 2898 2899=head1 Unicode Support 2900 2901Perl 5.6.0 introduced Unicode support. It's important for porters and XS 2902writers to understand this support and make sure that the code they 2903write does not corrupt Unicode data. 2904 2905=head2 What B<is> Unicode, anyway? 2906 2907In the olden, less enlightened times, we all used to use ASCII. Most of 2908us did, anyway. The big problem with ASCII is that it's American. Well, 2909no, that's not actually the problem; the problem is that it's not 2910particularly useful for people who don't use the Roman alphabet. What 2911used to happen was that particular languages would stick their own 2912alphabet in the upper range of the sequence, between 128 and 255. Of 2913course, we then ended up with plenty of variants that weren't quite 2914ASCII, and the whole point of it being a standard was lost. 2915 2916Worse still, if you've got a language like Chinese or 2917Japanese that has hundreds or thousands of characters, then you really 2918can't fit them into a mere 256, so they had to forget about ASCII 2919altogether, and build their own systems using pairs of numbers to refer 2920to one character. 2921 2922To fix this, some people formed Unicode, Inc. and 2923produced a new character set containing all the characters you can 2924possibly think of and more. There are several ways of representing these 2925characters, and the one Perl uses is called UTF-8. UTF-8 uses 2926a variable number of bytes to represent a character. You can learn more 2927about Unicode and Perl's Unicode model in L<perlunicode>. 2928 2929(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of 2930UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. 2931UTF-EBCDIC is like UTF-8, but the details are different. The macros 2932hide the differences from you, just remember that the particular numbers 2933and bit patterns presented below will differ in UTF-EBCDIC.) 2934 2935=head2 How can I recognise a UTF-8 string? 2936 2937You can't. This is because UTF-8 data is stored in bytes just like 2938non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 2939capital E with a grave accent, is represented by the two bytes 2940C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 2941has that byte sequence as well. So you can't tell just by looking -- this 2942is what makes Unicode input an interesting problem. 2943 2944In general, you either have to know what you're dealing with, or you 2945have to guess. The API function C<is_utf8_string> can help; it'll tell 2946you if a string contains only valid UTF-8 characters, and the chances 2947of a non-UTF-8 string looking like valid UTF-8 become very small very 2948quickly with increasing string length. On a character-by-character 2949basis, C<isUTF8_CHAR> 2950will tell you whether the current character in a string is valid UTF-8. 2951 2952=head2 How does UTF-8 represent Unicode characters? 2953 2954As mentioned above, UTF-8 uses a variable number of bytes to store a 2955character. Characters with values 0...127 are stored in one 2956byte, just like good ol' ASCII. Character 128 is stored as 2957C<v194.128>; this continues up to character 191, which is 2958C<v194.191>. Now we've run out of bits (191 is binary 2959C<10111111>) so we move on; character 192 is C<v195.128>. And 2960so it goes on, moving to three bytes at character 2048. 2961L<perlunicode/Unicode Encodings> has pictures of how this works. 2962 2963Assuming you know you're dealing with a UTF-8 string, you can find out 2964how long the first character in it is with the C<UTF8SKIP> macro: 2965 2966 char *utf = "\305\233\340\240\201"; 2967 I32 len; 2968 2969 len = UTF8SKIP(utf); /* len is 2 here */ 2970 utf += len; 2971 len = UTF8SKIP(utf); /* len is 3 here */ 2972 2973Another way to skip over characters in a UTF-8 string is to use 2974C<utf8_hop>, which takes a string and a number of characters to skip 2975over. You're on your own about bounds checking, though, so don't use it 2976lightly. 2977 2978All bytes in a multi-byte UTF-8 character will have the high bit set, 2979so you can test if you need to do something special with this 2980character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests 2981whether the byte is encoded as a single byte even in UTF-8): 2982 2983 U8 *utf; /* Initialize this to point to the beginning of the 2984 sequence to convert */ 2985 U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence 2986 pointed to by 'utf' */ 2987 UV uv; /* Returned code point; note: a UV, not a U8, not a 2988 char */ 2989 STRLEN len; /* Returned length of character in bytes */ 2990 2991 if (!UTF8_IS_INVARIANT(*utf)) 2992 /* Must treat this as UTF-8 */ 2993 uv = utf8_to_uvchr_buf(utf, utf_end, &len); 2994 else 2995 /* OK to treat this character as a byte */ 2996 uv = *utf; 2997 2998You can also see in that example that we use C<utf8_to_uvchr_buf> to get the 2999value of the character; the inverse function C<uvchr_to_utf8> is available 3000for putting a UV into UTF-8: 3001 3002 if (!UVCHR_IS_INVARIANT(uv)) 3003 /* Must treat this as UTF8 */ 3004 utf8 = uvchr_to_utf8(utf8, uv); 3005 else 3006 /* OK to treat this character as a byte */ 3007 *utf8++ = uv; 3008 3009You B<must> convert characters to UVs using the above functions if 3010you're ever in a situation where you have to match UTF-8 and non-UTF-8 3011characters. You may not skip over UTF-8 characters in this case. If you 3012do this, you'll lose the ability to match hi-bit non-UTF-8 characters; 3013for instance, if your UTF-8 string contains C<v196.172>, and you skip 3014that character, you can never match a C<chr(200)> in a non-UTF-8 string. 3015So don't do that! 3016 3017(Note that we don't have to test for invariant characters in the 3018examples above. The functions work on any well-formed UTF-8 input. 3019It's just that its faster to avoid the function overhead when it's not 3020needed.) 3021 3022=head2 How does Perl store UTF-8 strings? 3023 3024Currently, Perl deals with UTF-8 strings and non-UTF-8 strings 3025slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the 3026string is internally encoded as UTF-8. Without it, the byte value is the 3027codepoint number and vice versa. This flag is only meaningful if the SV 3028is C<SvPOK> or immediately after stringification via C<SvPV> or a 3029similar macro. You can check and manipulate this flag with the 3030following macros: 3031 3032 SvUTF8(sv) 3033 SvUTF8_on(sv) 3034 SvUTF8_off(sv) 3035 3036This flag has an important effect on Perl's treatment of the string: if 3037UTF-8 data is not properly distinguished, regular expressions, 3038C<length>, C<substr> and other string handling operations will have 3039undesirable (wrong) results. 3040 3041The problem comes when you have, for instance, a string that isn't 3042flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- 3043especially when combining non-UTF-8 and UTF-8 strings. 3044 3045Never forget that the C<SVf_UTF8> flag is separate from the PV value; you 3046need to be sure you don't accidentally knock it off while you're 3047manipulating SVs. More specifically, you cannot expect to do this: 3048 3049 SV *sv; 3050 SV *nsv; 3051 STRLEN len; 3052 char *p; 3053 3054 p = SvPV(sv, len); 3055 frobnicate(p); 3056 nsv = newSVpvn(p, len); 3057 3058The C<char*> string does not tell you the whole story, and you can't 3059copy or reconstruct an SV just by copying the string value. Check if the 3060old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act 3061accordingly: 3062 3063 p = SvPV(sv, len); 3064 is_utf8 = SvUTF8(sv); 3065 frobnicate(p, is_utf8); 3066 nsv = newSVpvn(p, len); 3067 if (is_utf8) 3068 SvUTF8_on(nsv); 3069 3070In the above, your C<frobnicate> function has been changed to be made 3071aware of whether or not it's dealing with UTF-8 data, so that it can 3072handle the string appropriately. 3073 3074Since just passing an SV to an XS function and copying the data of 3075the SV is not enough to copy the UTF8 flags, even less right is just 3076passing a S<C<char *>> to an XS function. 3077 3078For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the 3079string in an SV is to be I<treated> as UTF-8. This takes into account 3080if the call to the XS function is being made from within the scope of 3081L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the 3082UTF-8 string are to be exposed, rather than the character they 3083represent. But this pragma should only really be used for debugging and 3084perhaps low-level testing at the byte level. Hence most XS code need 3085not concern itself with this, but various areas of the perl core do need 3086to support it. 3087 3088And this isn't the whole story. Starting in Perl v5.12, strings that 3089aren't encoded in UTF-8 may also be treated as Unicode under various 3090conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). 3091This is only really a problem for characters whose ordinals are between 3092128 and 255, and their behavior varies under ASCII versus Unicode rules 3093in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). 3094There is no published API for dealing with this, as it is subject to 3095change, but you can look at the code for C<pp_lc> in F<pp.c> for an 3096example as to how it's currently done. 3097 3098=head2 How do I convert a string to UTF-8? 3099 3100If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade 3101the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do 3102this is: 3103 3104 sv_utf8_upgrade(sv); 3105 3106However, you must not do this, for example: 3107 3108 if (!SvUTF8(left)) 3109 sv_utf8_upgrade(left); 3110 3111If you do this in a binary operator, you will actually change one of the 3112strings that came into the operator, and, while it shouldn't be noticeable 3113by the end user, it can cause problems in deficient code. 3114 3115Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 3116string argument. This is useful for having the data available for 3117comparisons and so on, without harming the original SV. There's also 3118C<utf8_to_bytes> to go the other way, but naturally, this will fail if 3119the string contains any characters above 255 that can't be represented 3120in a single byte. 3121 3122=head2 How do I compare strings? 3123 3124L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic 3125comparison of two SV's, and handle UTF-8ness properly. Note, however, 3126that Unicode specifies a much fancier mechanism for collation, available 3127via the L<Unicode::Collate> module. 3128 3129To just compare two strings for equality/non-equality, you can just use 3130L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, 3131except the strings must be both UTF-8 or not UTF-8 encoded. 3132 3133To compare two strings case-insensitively, use 3134L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have 3135the same UTF-8ness). 3136 3137=head2 Is there anything else I need to know? 3138 3139Not really. Just remember these things: 3140 3141=over 3 3142 3143=item * 3144 3145There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 3146or not. But you can tell if an SV is to be treated as UTF-8 by calling 3147C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar 3148macro. And, you can tell if SV is actually UTF-8 (even if it is not to 3149be treated as such) by looking at its C<SvUTF8> flag (again after 3150stringifying it). Don't forget to set the flag if something should be 3151UTF-8. 3152Treat the flag as part of the PV, even though it's not -- if you pass on 3153the PV to somewhere, pass on the flag too. 3154 3155=item * 3156 3157If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, 3158unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 3159 3160=item * 3161 3162When writing a character UV to a UTF-8 string, B<always> use 3163C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case 3164you can use C<*s = uv>. 3165 3166=item * 3167 3168Mixing UTF-8 and non-UTF-8 strings is 3169tricky. Use C<bytes_to_utf8> to get 3170a new string which is UTF-8 encoded, and then combine them. 3171 3172=back 3173 3174=head1 Custom Operators 3175 3176Custom operator support is an experimental feature that allows you to 3177define your own ops. This is primarily to allow the building of 3178interpreters for other languages in the Perl core, but it also allows 3179optimizations through the creation of "macro-ops" (ops which perform the 3180functions of multiple ops which are usually executed together, such as 3181C<gvsv, gvsv, add>.) 3182 3183This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 3184core does not "know" anything special about this op type, and so it will 3185not be involved in any optimizations. This also means that you can 3186define your custom ops to be any op structure -- unary, binary, list and 3187so on -- you like. 3188 3189It's important to know what custom operators won't do for you. They 3190won't let you add new syntax to Perl, directly. They won't even let you 3191add new keywords, directly. In fact, they won't change the way Perl 3192compiles a program at all. You have to do those changes yourself, after 3193Perl has compiled the program. You do this either by manipulating the op 3194tree using a C<CHECK> block and the C<B::Generate> module, or by adding 3195a custom peephole optimizer with the C<optimize> module. 3196 3197When you do this, you replace ordinary Perl ops with custom ops by 3198creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own 3199PP function. This should be defined in XS code, and should look like 3200the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 3201takes the appropriate number of values from the stack, and you are 3202responsible for adding stack marks if necessary. 3203 3204You should also "register" your op with the Perl interpreter so that it 3205can produce sensible error and warning messages. Since it is possible to 3206have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 3207Perl uses the value of C<< o->op_ppaddr >> to determine which custom op 3208it is dealing with. You should create an C<XOP> structure for each 3209ppaddr you use, set the properties of the custom op with 3210C<XopENTRY_set>, and register the structure against the ppaddr using 3211C<Perl_custom_op_register>. A trivial example might look like: 3212 3213 static XOP my_xop; 3214 static OP *my_pp(pTHX); 3215 3216 BOOT: 3217 XopENTRY_set(&my_xop, xop_name, "myxop"); 3218 XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); 3219 Perl_custom_op_register(aTHX_ my_pp, &my_xop); 3220 3221The available fields in the structure are: 3222 3223=over 4 3224 3225=item xop_name 3226 3227A short name for your op. This will be included in some error messages, 3228and will also be returned as C<< $op->name >> by the L<B|B> module, so 3229it will appear in the output of module like L<B::Concise|B::Concise>. 3230 3231=item xop_desc 3232 3233A short description of the function of the op. 3234 3235=item xop_class 3236 3237Which of the various C<*OP> structures this op uses. This should be one of 3238the C<OA_*> constants from F<op.h>, namely 3239 3240=over 4 3241 3242=item OA_BASEOP 3243 3244=item OA_UNOP 3245 3246=item OA_BINOP 3247 3248=item OA_LOGOP 3249 3250=item OA_LISTOP 3251 3252=item OA_PMOP 3253 3254=item OA_SVOP 3255 3256=item OA_PADOP 3257 3258=item OA_PVOP_OR_SVOP 3259 3260This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because 3261the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. 3262 3263=item OA_LOOP 3264 3265=item OA_COP 3266 3267=back 3268 3269The other C<OA_*> constants should not be used. 3270 3271=item xop_peep 3272 3273This member is of type C<Perl_cpeep_t>, which expands to C<void 3274(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function 3275will be called from C<Perl_rpeep> when ops of this type are encountered 3276by the peephole optimizer. I<o> is the OP that needs optimizing; 3277I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. 3278 3279=back 3280 3281C<B::Generate> directly supports the creation of custom ops by name. 3282 3283=head1 Stacks 3284 3285Descriptions above occasionally refer to "the stack", but there are in fact 3286many stack-like data structures within the perl interpreter. When otherwise 3287unqualified, "the stack" usually refers to the value stack. 3288 3289The various stacks have different purposes, and operate in slightly different 3290ways. Their differences are noted below. 3291 3292=head2 Value Stack 3293 3294This stack stores the values that regular perl code is operating on, usually 3295intermediate values of expressions within a statement. The stack itself is 3296formed of an array of SV pointers. 3297 3298The base of this stack is pointed to by the interpreter variable 3299C<PL_stack_base>, of type C<SV **>. 3300 3301The head of the stack is C<PL_stack_sp>, and points to the most 3302recently-pushed item. 3303 3304Items are pushed to the stack by using the C<PUSHs()> macro or its variants 3305described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed 3306versions. Note carefully that the non-C<X> versions of these macros do not 3307check the size of the stack and assume it to be big enough. These must be 3308paired with a suitable check of the stack's size, such as the C<EXTEND> macro 3309to ensure it is large enough. For example 3310 3311 EXTEND(SP, 4); 3312 mPUSHi(10); 3313 mPUSHi(20); 3314 mPUSHi(30); 3315 mPUSHi(40); 3316 3317This is slightly more performant than making four separate checks in four 3318separate C<mXPUSHi()> calls. 3319 3320As a further performance optimisation, the various C<PUSH> macros all operate 3321using a local variable C<SP>, rather than the interpreter-global variable 3322C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is 3323normally implied by XSUBs and similar so it is rare you have to consider it 3324directly. Once declared, the C<PUSH> macros will operate only on this local 3325variable, so before invoking any other perl core functions you must use the 3326C<PUTBACK> macro to return the value from the local C<SP> variable back to 3327the interpreter variable. Similarly, after calling a perl core function which 3328may have had reason to move the stack or push/pop values to it, you must use 3329the C<SPAGAIN> macro which refreshes the local C<SP> value back from the 3330interpreter one. 3331 3332Items are popped from the stack by using the C<POPs> macro or its typed 3333versions, There is also a macro C<TOPs> that inspects the topmost item without 3334removing it. 3335 3336Note specifically that SV pointers on the value stack do not contribute to the 3337overall reference count of the xVs being referred to. If newly-created xVs are 3338being pushed to the stack you must arrange for them to be destroyed at a 3339suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> 3340to mortalise the xV. 3341 3342=head2 Mark Stack 3343 3344The value stack stores individual perl scalar values as temporaries between 3345expressions. Some perl expressions operate on entire lists; for that purpose 3346we need to know where on the stack each list begins. This is the purpose of the 3347mark stack. 3348 3349The mark stack stores integers as I32 values, which are the height of the 3350value stack at the time before the list began; thus the mark itself actually 3351points to the value stack entry one before the list. The list itself starts at 3352C<mark + 1>. 3353 3354The base of this stack is pointed to by the interpreter variable 3355C<PL_markstack>, of type C<I32 *>. 3356 3357The head of the stack is C<PL_markstack_ptr>, and points to the most 3358recently-pushed item. 3359 3360Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though 3361the stack itself stores (value) stack indices as integers, the C<PUSHMARK> 3362macro should be given a stack pointer directly; it will calculate the index 3363offset by comparing to the C<PL_stack_sp> variable. Thus almost always the 3364code to perform this is 3365 3366 PUSHMARK(SP); 3367 3368Items are popped from the stack by the C<POPMARK> macro. There is also a macro 3369C<TOPMARK> that inspects the topmost item without removing it. These macros 3370return I32 index values directly. There is also the C<dMARK> macro which 3371declares a new SV double-pointer variable, called C<mark>, which points at the 3372marked stack slot; this is the usual macro that C code will use when operating 3373on lists given on the stack. 3374 3375As noted above, the C<mark> variable itself will point at the most recently 3376pushed value on the value stack before the list begins, and so the list itself 3377starts at C<mark + 1>. The values of the list may be iterated by code such as 3378 3379 for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { 3380 SV *item = *svp; 3381 ... 3382 } 3383 3384Note specifically in the case that the list is already empty, C<mark> will 3385equal C<PL_stack_sp>. 3386 3387Because the C<mark> variable is converted to a pointer on the value stack, 3388extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are 3389invoked within the function, because the stack may need to be moved to 3390extend it and so the existing pointer will now be invalid. If this may be a 3391problem, a possible solution is to track the mark offset as an integer and 3392track the mark itself later on after the stack had been moved. 3393 3394 I32 markoff = POPMARK; 3395 3396 ... 3397 3398 SP **mark = PL_stack_base + markoff; 3399 3400=head2 Temporaries Stack 3401 3402As noted above, xV references on the main value stack do not contribute to the 3403reference count of an xV, and so another mechanism is used to track when 3404temporary values which live on the stack must be released. This is the job of 3405the temporaries stack. 3406 3407The temporaries stack stores pointers to xVs whose reference counts will be 3408decremented soon. 3409 3410The base of this stack is pointed to by the interpreter variable 3411C<PL_tmps_stack>, of type C<SV **>. 3412 3413The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the 3414index in the array of the most recently-pushed item. 3415 3416There is no public API to directly push items to the temporaries stack. Instead, 3417the API function C<sv_2mortal()> is used to mortalize an xV, adding its 3418address to the temporaries stack. 3419 3420Likewise, there is no public API to read values from the temporaries stack. 3421Instead. the macros C<SAVETMPS> and C<FREETPMS> are used. The C<SAVETMPS> 3422macro establishes the base levels of the temporaries stack, by capturing the 3423current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous 3424value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of 3425the temporaries that have been pushed since that level are reclaimed. 3426 3427While it is common to see these two macros in pairs within an C<ENTER>/ 3428C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke 3429C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a 3430loop iterating over elements of a list. While you can invoke C<SAVETMPS> 3431multiple times within a scope pair, it is unlikely to be useful. Subsequent 3432invocations will move the temporaries floor further up, thus effectively 3433trapping the existing temporaries to only be released at the end of the scope. 3434 3435=head2 Save Stack 3436 3437The save stack is used by perl to implement the C<local> keyword and other 3438similar behaviours; any cleanup operations that need to be performed when 3439leaving the current scope. Items pushed to this stack generally capture the 3440current value of some internal variable or state, which will be restored when 3441the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other 3442reasons. 3443 3444Whereas other perl internal stacks store individual items all of the same type 3445(usually SV pointers or integers), the items pushed to the save stack are 3446formed of many different types, having multiple fields to them. For example, 3447the C<SAVEt_INT> type needs to store both the address of the C<int> variable 3448to restore, and the value to restore it to. This information could have been 3449stored using fields of a C<struct>, but would have to be large enough to store 3450three pointers in the largest case, which would waste a lot of space in most 3451of the smaller cases. 3452 3453Instead, the stack stores information in a variable-length encoding of C<ANY> 3454structures. The final value pushed is stored in the C<UV> field which encodes 3455the kind of item held by the preceeding items; the count and types of which 3456will depend on what kind of item is being stored. The kind field is pushed 3457last because that will be the first field to be popped when unwinding items 3458from the stack. 3459 3460The base of this stack is pointed to by the interpreter variable 3461C<PL_savestack>, of type C<ANY *>. 3462 3463The head of the stack is indexed by C<PL_savestack_ix>, an integer which 3464stores the index in the array at which the next item should be pushed. (Note 3465that this is different to most other stacks, which reference the most 3466recently-pushed item). 3467 3468Items are pushed to the save stack by using the various C<SAVE...()> macros. 3469Many of these macros take a variable and store both its address and current 3470value on the save stack, ensuring that value gets restored on scope exit. 3471 3472 SAVEI8(i8) 3473 SAVEI16(i16) 3474 SAVEI32(i32) 3475 SAVEINT(i) 3476 ... 3477 3478There are also a variety of other special-purpose macros which save particular 3479types or values of interest. C<SAVETMPS> has already been mentioned above. 3480Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to 3481be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to 3482be invoked on scope exit. A full list of such macros can be found in 3483F<scope.h>. 3484 3485There is no public API for popping individual values or items from the save 3486stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way 3487to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will 3488restore all of the saved values that had been pushed since the most recent 3489C<ENTER>. 3490 3491=head2 Scope Stack 3492 3493As with the mark stack to the value stack, the scope stack forms a pair with 3494the save stack. The scope stack stores the height of the save stack at which 3495nested scopes begin, and allows the save stack to be unwound back to that 3496point when the scope is left. 3497 3498When perl is built with debugging enabled, there is a second part to this 3499stack storing human-readable string names describing the type of stack 3500context. Each push operation saves the name as well as the height of the save 3501stack, and each pop operation checks the topmost name with what is expected, 3502causing an assertion failure if the name does not match. 3503 3504The base of this stack is pointed to by the interpreter variable 3505C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are 3506stored in a separate array pointed to by C<PL_scopestack_name>, of type 3507C<const char **>. 3508 3509The head of the stack is indexed by C<PL_scopestack_ix>, an integer which 3510stores the index of the array or arrays at which the next item should be 3511pushed. (Note that this is different to most other stacks, which reference the 3512most recently-pushed item). 3513 3514Values are pushed to the scope stack using the C<ENTER> macro, which begins a 3515new nested scope. Any items pushed to the save stack are then restored at the 3516next nested invocation of the C<LEAVE> macro. 3517 3518=head1 Dynamic Scope and the Context Stack 3519 3520B<Note:> this section describes a non-public internal API that is subject 3521to change without notice. 3522 3523=head2 Introduction to the context stack 3524 3525In Perl, dynamic scoping refers to the runtime nesting of things like 3526subroutine calls, evals etc, as well as the entering and exiting of block 3527scopes. For example, the restoring of a C<local>ised variable is 3528determined by the dynamic scope. 3529 3530Perl tracks the dynamic scope by a data structure called the context 3531stack, which is an array of C<PERL_CONTEXT> structures, and which is 3532itself a big union for all the types of context. Whenever a new scope is 3533entered (such as a block, a C<for> loop, or a subroutine call), a new 3534context entry is pushed onto the stack. Similarly when leaving a block or 3535returning from a subroutine call etc. a context is popped. Since the 3536context stack represents the current dynamic scope, it can be searched. 3537For example, C<next LABEL> searches back through the stack looking for a 3538loop context that matches the label; C<return> pops contexts until it 3539finds a sub or eval context or similar; C<caller> examines sub contexts on 3540the stack. 3541 3542Each context entry is labelled with a context type, C<cx_type>. Typical 3543context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> 3544and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) 3545and a sort block. The type determines which part of the context union are 3546valid. 3547 3548The main division in the context struct is between a substitution scope 3549(C<CXt_SUBST>) and block scopes, which are everything else. The former is 3550just used while executing C<s///e>, and won't be discussed further 3551here. 3552 3553All the block scope types share a common base, which corresponds to 3554C<CXt_BLOCK>. This stores the old values of various scope-related 3555variables like C<PL_curpm>, as well as information about the current 3556scope, such as C<gimme>. On scope exit, the old variables are restored. 3557 3558Particular block scope types store extra per-type information. For 3559example, C<CXt_SUB> stores the currently executing CV, while the various 3560for loop types might hold the original loop variable SV. On scope exit, 3561the per-type data is processed; for example the CV has its reference count 3562decremented, and the original loop variable is restored. 3563 3564The macro C<cxstack> returns the base of the current context stack, while 3565C<cxstack_ix> is the index of the current frame within that stack. 3566 3567In fact, the context stack is actually part of a stack-of-stacks system; 3568whenever something unusual is done such as calling a C<DESTROY> or tie 3569handler, a new stack is pushed, then popped at the end. 3570 3571Note that the API described here changed considerably in perl 5.24; prior 3572to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 3573they were replaced by the inline static functions described below. In 3574addition, the ordering and detail of how these macros/function work 3575changed in many ways, often subtly. In particular they didn't handle 3576saving the savestack and temps stack positions, and required additional 3577C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The 3578old-style macros will not be described further. 3579 3580 3581=head2 Pushing contexts 3582 3583For pushing a new context, the two basic functions are 3584C<cx = cx_pushblock()>, which pushes a new basic context block and returns 3585its address, and a family of similar functions with names like 3586C<cx_pushsub(cx)> which populate the additional type-dependent fields in 3587the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their 3588own push functions, as they don't store any data beyond that pushed by 3589C<cx_pushblock>. 3590 3591The fields of the context struct and the arguments to the C<cx_*> 3592functions are subject to change between perl releases, representing 3593whatever is convenient or efficient for that release. 3594 3595A typical context stack pushing can be found in C<pp_entersub>; the 3596following shows a simplified and stripped-down example of a non-XS call, 3597along with comments showing roughly what each function does. 3598 3599 dMARK; 3600 U8 gimme = GIMME_V; 3601 bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); 3602 OP *retop = PL_op->op_next; 3603 I32 old_ss_ix = PL_savestack_ix; 3604 CV *cv = ....; 3605 3606 /* ... make mortal copies of stack args which are PADTMPs here ... */ 3607 3608 /* ... do any additional savestack pushes here ... */ 3609 3610 /* Now push a new context entry of type 'CXt_SUB'; initially just 3611 * doing the actions common to all block types: */ 3612 3613 cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); 3614 3615 /* this does (approximately): 3616 CXINC; /* cxstack_ix++ (grow if necessary) */ 3617 cx = CX_CUR(); /* and get the address of new frame */ 3618 cx->cx_type = CXt_SUB; 3619 cx->blk_gimme = gimme; 3620 cx->blk_oldsp = MARK - PL_stack_base; 3621 cx->blk_oldsaveix = old_ss_ix; 3622 cx->blk_oldcop = PL_curcop; 3623 cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; 3624 cx->blk_oldscopesp = PL_scopestack_ix; 3625 cx->blk_oldpm = PL_curpm; 3626 cx->blk_old_tmpsfloor = PL_tmps_floor; 3627 3628 PL_tmps_floor = PL_tmps_ix; 3629 */ 3630 3631 3632 /* then update the new context frame with subroutine-specific info, 3633 * such as the CV about to be executed: */ 3634 3635 cx_pushsub(cx, cv, retop, hasargs); 3636 3637 /* this does (approximately): 3638 cx->blk_sub.cv = cv; 3639 cx->blk_sub.olddepth = CvDEPTH(cv); 3640 cx->blk_sub.prevcomppad = PL_comppad; 3641 cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; 3642 cx->blk_sub.retop = retop; 3643 SvREFCNT_inc_simple_void_NN(cv); 3644 */ 3645 3646Note that C<cx_pushblock()> sets two new floors: for the args stack (to 3647C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this 3648scope level, every C<nextstate> (amongst others) will reset the args and 3649tmps stack levels to these floors. Note that since C<cx_pushblock> uses 3650the current value of C<PL_tmps_ix> rather than it being passed as an arg, 3651this dictates at what point C<cx_pushblock> should be called. In 3652particular, any new mortals which should be freed only on scope exit 3653(rather than at the next C<nextstate>) should be created first. 3654 3655Most callers of C<cx_pushblock> simply set the new args stack floor to the 3656top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the 3657items being iterated over on the stack, and so sets C<blk_oldsp> to the 3658top of these items instead. Note that, contrary to its name, C<blk_oldsp> 3659doesn't always represent the value to restore C<PL_stack_sp> to on scope 3660exit. 3661 3662Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is 3663later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, 3664this is because, although most values needing saving are stored in fields 3665of the context struct, an extra value needs saving only when the debugger 3666is running, and it doesn't make sense to bloat the struct for this rare 3667case. So instead it is saved on the savestack. Since this value gets 3668calculated and saved before the context is pushed, it is necessary to pass 3669the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the 3670saved value gets freed during scope exit. For most users of 3671C<cx_pushblock>, where nothing needs pushing on the save stack, 3672C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. 3673 3674Note that where possible, values should be saved in the context struct 3675rather than on the save stack; it's much faster that way. 3676 3677Normally C<cx_pushblock> should be immediately followed by the appropriate 3678C<cx_pushfoo>, with nothing between them; this is because if code 3679in-between could die (e.g. a warning upgraded to fatal), then the context 3680stack unwinding code in C<dounwind> would see (in the example above) a 3681C<CXt_SUB> context frame, but without all the subroutine-specific fields 3682set, and crashes would soon ensue. 3683 3684Where the two must be separate, initially set the type to C<CXt_NULL> or 3685C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the 3686C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's 3687determined which type of loop it's pushing. 3688 3689=head2 Popping contexts 3690 3691Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note 3692however, that unlike C<cx_pushblock>, neither of these functions actually 3693decrement the current context stack index; this is done separately using 3694C<CX_POP()>. 3695 3696There are two main ways that contexts are popped. During normal execution 3697as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and 3698C<pp_leavesub> process and pop just one context using C<cx_popfoo> and 3699C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> 3700may have to pop back several scopes until a sub or loop context is found, 3701and exceptions (such as C<die>) need to pop back contexts until an eval 3702context is found. Both of these are accomplished by C<dounwind()>, which 3703is capable of processing and popping all contexts above the target one. 3704 3705Here is a typical example of context popping, as found in C<pp_leavesub> 3706(simplified slightly): 3707 3708 U8 gimme; 3709 PERL_CONTEXT *cx; 3710 SV **oldsp; 3711 OP *retop; 3712 3713 cx = CX_CUR(); 3714 3715 gimme = cx->blk_gimme; 3716 oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ 3717 3718 if (gimme == G_VOID) 3719 PL_stack_sp = oldsp; 3720 else 3721 leave_adjust_stacks(oldsp, oldsp, gimme, 0); 3722 3723 CX_LEAVE_SCOPE(cx); 3724 cx_popsub(cx); 3725 cx_popblock(cx); 3726 retop = cx->blk_sub.retop; 3727 CX_POP(cx); 3728 3729 return retop; 3730 3731The steps above are in a very specific order, designed to be the reverse 3732order of when the context was pushed. The first thing to do is to copy 3733and/or protect any return arguments and free any temps in the current 3734scope. Scope exits like an rvalue sub normally return a mortal copy of 3735their return args (as opposed to lvalue subs). It is important to make 3736this copy before the save stack is popped or variables are restored, or 3737bad things like the following can happen: 3738 3739 sub f { my $x =...; $x } # $x freed before we get to copy it 3740 sub f { /(...)/; $1 } # PL_curpm restored before $1 copied 3741 3742Although we wish to free any temps at the same time, we have to be careful 3743not to free any temps which are keeping return args alive; nor to free the 3744temps we have just created while mortal copying return args. Fortunately, 3745C<leave_adjust_stacks()> is capable of making mortal copies of return args, 3746shifting args down the stack, and only processing those entries on the 3747temps stack that are safe to do so. 3748 3749In void context no args are returned, so it's more efficient to skip 3750calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op 3751is likely to be imminently called which will do a C<FREETMPS>, so there's 3752no need to do that either. 3753 3754The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just 3755defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the 3756popping, it's possible for perl to call destructors, call C<STORE> to undo 3757localisations of tied vars, and so on. Any of these can die or call 3758C<exit()>. In this case, C<dounwind()> will be called, and the current 3759context stack frame will be re-processed. Thus it is vital that all steps 3760in popping a context are done in such a way to support reentrancy. The 3761other alternative, of decrementing C<cxstack_ix> I<before> processing the 3762frame, would lead to leaks and the like if something died halfway through, 3763or overwriting of the current frame. 3764 3765C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack 3766items have been popped before dying and getting trapped by eval, then the 3767C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where 3768the first one left off. 3769 3770The next step is the type-specific context processing; in this case 3771C<cx_popsub>. In part, this looks like: 3772 3773 cv = cx->blk_sub.cv; 3774 CvDEPTH(cv) = cx->blk_sub.olddepth; 3775 cx->blk_sub.cv = NULL; 3776 SvREFCNT_dec(cv); 3777 3778where its processing the just-executed CV. Note that before it decrements 3779the CV's reference count, it nulls the C<blk_sub.cv>. This means that if 3780it re-enters, the CV won't be freed twice. It also means that you can't 3781rely on such type-specific fields having useful values after the return 3782from C<cx_popfoo>. 3783 3784Next, C<cx_popblock> restores all the various interpreter vars to their 3785previous values or previous high water marks; it expands to: 3786 3787 PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; 3788 PL_scopestack_ix = cx->blk_oldscopesp; 3789 PL_curpm = cx->blk_oldpm; 3790 PL_curcop = cx->blk_oldcop; 3791 PL_tmps_floor = cx->blk_old_tmpsfloor; 3792 3793Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, 3794which value to restore it to depends on the context type (specifically 3795C<for (list) {}>), and what args (if any) it returns; and that will 3796already have been sorted out earlier by C<leave_adjust_stacks()>. 3797 3798Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. 3799After this point, it's possible that that the current context frame could 3800be overwritten by other contexts being pushed. Although things like ties 3801and C<DESTROY> are supposed to work within a new context stack, it's best 3802not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately 3803sets C<cx> to null to detect code that is still relying on the field 3804values in that context frame. Note in the C<pp_leavesub()> example above, 3805we grab C<blk_sub.retop> I<before> calling C<CX_POP>. 3806 3807=head2 Redoing contexts 3808 3809Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> 3810as regards to resetting various vars to their base values. It is used in 3811places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than 3812exiting a scope, we want to re-initialise the scope. As well as resetting 3813C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, 3814C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a 3815C<FREETMPS>. 3816 3817 3818=head1 Slab-based operator allocation 3819 3820B<Note:> this section describes a non-public internal API that is subject 3821to change without notice. 3822 3823Perl's internal error-handling mechanisms implement C<die> (and its internal 3824equivalents) using longjmp. If this occurs during lexing, parsing or 3825compilation, we must ensure that any ops allocated as part of the compilation 3826process are freed. (Older Perl versions did not adequately handle this 3827situation: when failing a parse, they would leak ops that were stored in 3828C C<auto> variables and not linked anywhere else.) 3829 3830To handle this situation, Perl uses I<op slabs> that are attached to the 3831currently-compiling CV. A slab is a chunk of allocated memory. New ops are 3832allocated as regions of the slab. If the slab fills up, a new one is created 3833(and linked from the previous one). When an error occurs and the CV is freed, 3834any ops remaining are freed. 3835 3836Each op is preceded by two pointers: one points to the next op in the slab, and 3837the other points to the slab that owns it. The next-op pointer is needed so 3838that Perl can iterate over a slab and free all its ops. (Op structures are of 3839different sizes, so the slab's ops can't merely be treated as a dense array.) 3840The slab pointer is needed for accessing a reference count on the slab: when 3841the last op on a slab is freed, the slab itself is freed. 3842 3843The slab allocator puts the ops at the end of the slab first. This will tend to 3844allocate the leaves of the op tree first, and the layout will therefore 3845hopefully be cache-friendly. In addition, this means that there's no need to 3846store the size of the slab (see below on why slabs vary in size), because Perl 3847can follow pointers to find the last op. 3848 3849It might seem possible eliminate slab reference counts altogether, by having 3850all ops implicitly attached to C<PL_compcv> when allocated and freed when the 3851CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and 3852thus free ops faster. But that doesn't work in those cases where ops need to 3853survive beyond their CVs, such as re-evals. 3854 3855The CV also has to have a reference count on the slab. Sometimes the first op 3856created is immediately freed. If the reference count of the slab reaches 0, 3857then it will be freed with the CV still pointing to it. 3858 3859CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count 3860on the slab. When this flag is set, the slab is accessible via C<CvSTART> when 3861C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from 3862C<CvROOT> when it is set. The alternative to this approach of sneaking the slab 3863into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by 3864another pointer. But that would make all CVs larger, even though slab-based op 3865freeing is typically of benefit only for programs that make significant use of 3866string eval. 3867 3868When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing 3869the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is 3870assumed that a compilation error has occurred, so the op slab is traversed and 3871all the ops are freed. 3872 3873Under normal circumstances, the CV forgets about its slab (decrementing the 3874reference count) when the root is attached. So the slab reference counting that 3875happens when ops are freed takes care of freeing the slab. In some cases, the 3876CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the 3877ops can survive after the CV is done away with. 3878 3879Forgetting the slab when the root is attached is not strictly necessary, but 3880avoids potential problems with C<CvROOT> being written over. There is code all 3881over the place, both in core and on CPAN, that does things with C<CvROOT>, so 3882forgetting the slab makes things more robust and avoids potential problems. 3883 3884Since the CV takes ownership of its slab when flagged, that flag is never 3885copied when a CV is cloned, as one CV could free a slab that another CV still 3886points to, since forced freeing of ops ignores the reference count (but asserts 3887that it looks right). 3888 3889To avoid slab fragmentation, freed ops are marked as freed and attached to the 3890slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused 3891when possible. Not reusing freed ops would be simpler, but it would result in 3892significantly higher memory usage for programs with large C<if (DEBUG) {...}> 3893blocks. 3894 3895C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause 3896an op to be freed after its CV. If the CV has forcibly freed the ops on its 3897slab and the slab itself, then we will be fiddling with a freed slab. Making 3898C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when 3899there is no compilation error, so the op would never be freed. It holds 3900a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> 3901now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of 3902ops after a compilation error won't free any ops thus marked. 3903 3904Since many pieces of code create tiny subroutines consisting of only a few ops, 3905and since a huge slab would be quite a bit of baggage for those to carry 3906around, the first slab is always very small. To avoid allocating too many 3907slabs for a single CV, each subsequent slab is twice the size of the previous. 3908 3909Smartmatch expects to be able to allocate an op at run time, run it, and then 3910throw it away. For that to work the op is simply malloced when PL_compcv hasn't 3911been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), 3912to distinguish them from malloced ops. 3913 3914 3915=head1 AUTHORS 3916 3917Until May 1997, this document was maintained by Jeff Okamoto 3918E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 3919itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 3920 3921With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 3922Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 3923Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 3924Stephen McCamant, and Gurusamy Sarathy. 3925 3926=head1 SEE ALSO 3927 3928L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> 3929