1=for comment 2The part of this file between =for mg_vtable.pl markers is auto 3generated by mg_vtable.pl; any changes there need to be made instead to 4mg_vtable.pl 5 6=head1 NAME 7 8perlguts - Introduction to the Perl API 9 10=head1 DESCRIPTION 11 12This document attempts to describe how to use the Perl API, as well as 13to provide some info on the basic workings of the Perl core. It is far 14from complete and probably contains many errors. Please refer any 15questions or comments to the author below. 16 17=head1 Variables 18 19=head2 Datatypes 20 21Perl has three typedefs that handle Perl's three main data types: 22 23 SV Scalar Value 24 AV Array Value 25 HV Hash Value 26 27Each typedef has specific routines that manipulate the various data types. 28 29=for apidoc_section $AV 30=for apidoc Ayh||AV 31=for apidoc_section $HV 32=for apidoc Ayh||HV 33=for apidoc_section $SV 34=for apidoc Ayh||SV 35 36=head2 What is an "IV"? 37 38Perl uses a special typedef IV which is a simple signed integer type that is 39guaranteed to be large enough to hold a pointer (as well as an integer). 40Additionally, there is the UV, which is simply an unsigned IV. 41 42Perl also uses several special typedefs to declare variables to hold 43integers of (at least) a given size. 44Use I8, I16, I32, and I64 to declare a signed integer variable which has 45at least as many bits as the number in its name. These all evaluate to 46the native C type that is closest to the given number of bits, but no 47smaller than that number. For example, on many platforms, a C<short> is 4816 bits long, and if so, I16 will evaluate to a C<short>. But on 49platforms where a C<short> isn't exactly 16 bits, Perl will use the 50smallest type that contains 16 bits or more. 51 52U8, U16, U32, and U64 are to declare the corresponding unsigned integer 53types. 54 55If the platform doesn't support 64-bit integers, both I64 and U64 will 56be undefined. Use IV and UV to declare the largest practicable, and 57C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which 58may not be usable in all circumstances. 59 60A numeric constant can be specified with L<perlapi/C<INT16_C>>, 61L<perlapi/C<UINTMAX_C>>, and similar. 62 63=for apidoc_section $integer 64=for apidoc Ayh||I8 65=for apidoc_item ||I16 66=for apidoc_item ||I32 67=for apidoc_item ||I64 68=for apidoc_item ||IV 69 70=for apidoc Ayh||U8 71=for apidoc_item ||U16 72=for apidoc_item ||U32 73=for apidoc_item ||U64 74=for apidoc_item ||UV 75 76=head2 Working with SVs 77 78An SV can be created and loaded with one command. There are five types of 79values that can be loaded: an integer value (IV), an unsigned integer 80value (UV), a double (NV), a string (PV), and another scalar (SV). 81("PV" stands for "Pointer Value". You might think that it is misnamed 82because it is described as pointing only to strings. However, it is 83possible to have it point to other things. For example, it could point 84to an array of UVs. But, 85using it for non-strings requires care, as the underlying assumption of 86much of the internals is that PVs are just for strings. Often, for 87example, a trailing C<NUL> is tacked on automatically. The non-string use 88is documented only in this paragraph.) 89 90=for apidoc_section $floating 91=for apidoc Ayh||NV 92 93The seven routines are: 94 95 SV* newSViv(IV); 96 SV* newSVuv(UV); 97 SV* newSVnv(double); 98 SV* newSVpv(const char*, STRLEN); 99 SV* newSVpvn(const char*, STRLEN); 100 SV* newSVpvf(const char*, ...); 101 SV* newSVsv(SV*); 102 103C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in 104F<config.h>) guaranteed to be large enough to represent the size of 105any string that perl can handle. 106 107=for apidoc_section $string 108=for apidoc Ayh||STRLEN 109 110In the unlikely case of a SV requiring more complex initialization, you 111can create an empty SV with newSV(len). If C<len> is 0 an empty SV of 112type NULL is returned, else an SV of type PV is returned with len + 1 (for 113the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases 114the SV has the undef value. 115 116 SV *sv = newSV(0); /* no storage allocated */ 117 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage 118 * allocated */ 119 120To change the value of an I<already-existing> SV, there are eight routines: 121 122 void sv_setiv(SV*, IV); 123 void sv_setuv(SV*, UV); 124 void sv_setnv(SV*, double); 125 void sv_setpv(SV*, const char*); 126 void sv_setpvn(SV*, const char*, STRLEN) 127 void sv_setpvf(SV*, const char*, ...); 128 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, 129 SV **, Size_t, bool *); 130 void sv_setsv(SV*, SV*); 131 132Notice that you can choose to specify the length of the string to be 133assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 134allow Perl to calculate the length by using C<sv_setpv> or by specifying 1350 as the second argument to C<newSVpv>. Be warned, though, that Perl will 136determine the string's length by using C<strlen>, which depends on the 137string terminating with a C<NUL> character, and not otherwise containing 138NULs. 139 140The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 141formatted output becomes the value. 142 143C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 144either a pointer to a variable argument list or the address and length of 145an array of SVs. The last argument points to a boolean; on return, if that 146boolean is true, then locale-specific information has been used to format 147the string, and the string's contents are therefore untrustworthy (see 148L<perlsec>). This pointer may be NULL if that information is not 149important. Note that this function requires you to specify the length of 150the format. 151 152The C<sv_set*()> functions are not generic enough to operate on values 153that have "magic". See L</Magic Virtual Tables> later in this document. 154 155All SVs that contain strings should be terminated with a C<NUL> character. 156If it is not C<NUL>-terminated there is a risk of 157core dumps and corruptions from code which passes the string to C 158functions or system calls which expect a C<NUL>-terminated string. 159Perl's own functions typically add a trailing C<NUL> for this reason. 160Nevertheless, you should be very careful when you pass a string stored 161in an SV to a C function or system call. 162 163To access the actual value that an SV points to, Perl's API exposes 164several macros that coerce the actual scalar type into an IV, UV, double, 165or string: 166 167=over 168 169=item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>) 170 171=item * C<SvNV(SV*)> (C<double>) 172 173=item * Strings are a bit complicated: 174 175=over 176 177=item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)> 178 179If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>. 180 181This is suitable for Perl strings that represent bytes. 182 183=item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)> 184 185If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>. 186 187This is suitable for Perl strings that represent characters. 188 189B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant, 190which means that if the SV contains non-Unicode code points (e.g., 1910x110000), then the result may contain extensions over valid UTF-8. 192See L<perlapi/is_strict_utf8_string> for some methods Perl gives 193you to check the UTF-8 validity of these macros' returns. 194 195=item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)> 196to fetch the SV's raw internal buffer. This is tricky, though; if your Perl 197string 198is C<"\xff\xff">, then depending on the SV's internal encoding you might get 199back a 2-byte B<OR> a 4-byte C<char*>. 200Moreover, if it's the 4-byte string, that could come from either Perl 201C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored 202as raw octets. To differentiate between these you B<MUST> look up the 203SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string 204is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be 205off). 206 207B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or 208similarly-named macros I<without> looking up the SV's UTF8 bit is 209almost certainly a bug if non-ASCII input is allowed. 210 211When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies 212here as for C<SvPVutf8>. 213 214=back 215 216(See L</How do I pass a Perl string to a C library?> for more details.) 217 218In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned 219is placed into the 220variable C<len> (these are macros, so you do I<not> use C<&len>). If you do 221not care what the length of the data is, use C<SvPVbyte_nolen>, 222C<SvPVutf8_nolen>, or C<SvPV_nolen> instead. 223The global variable C<PL_na> can also be given to 224C<SvPVbyte>/C<SvPVutf8>/C<SvPV> 225in this case. But that can be quite inefficient because C<PL_na> must 226be accessed in thread-local storage in threaded Perl. In any case, remember 227that Perl allows arbitrary strings of data that may both contain NULs and 228might not be terminated by a C<NUL>. 229 230Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len), 231len);>. It might work with your 232compiler, but it won't work for everyone. 233Break this sort of statement up into separate assignments: 234 235 SV *s; 236 STRLEN len; 237 char *ptr; 238 ptr = SvPVbyte(s, len); 239 foo(ptr, len); 240 241=back 242 243If you want to know if the scalar value is TRUE, you can use: 244 245 SvTRUE(SV*) 246 247Although Perl will automatically grow strings for you, if you need to force 248Perl to allocate more memory for your SV, you can use the macro 249 250 SvGROW(SV*, STRLEN newlen) 251 252which will determine if more memory needs to be allocated. If so, it will 253call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 254decrease, the allocated memory of an SV and that it does not automatically 255add space for the trailing C<NUL> byte (perl's own string functions typically do 256C<SvGROW(sv, len + 1)>). 257 258If you want to write to an existing SV's buffer and set its value to a 259string, use SvPVbyte_force() or one of its variants to force the SV to be 260a PV. This will remove any of various types of non-stringness from 261the SV while preserving the content of the SV in the PV. This can be 262used, for example, to append data from an API function to a buffer 263without extra copying: 264 265 (void)SvPVbyte_force(sv, len); 266 s = SvGROW(sv, len + needlen + 1); 267 /* something that modifies up to needlen bytes at s+len, but 268 modifies newlen bytes 269 eg. newlen = read(fd, s + len, needlen); 270 ignoring errors for these examples 271 */ 272 s[len + newlen] = '\0'; 273 SvCUR_set(sv, len + newlen); 274 SvUTF8_off(sv); 275 SvSETMAGIC(sv); 276 277If you already have the data in memory or if you want to keep your 278code simple, you can use one of the sv_cat*() variants, such as 279sv_catpvn(). If you want to insert anywhere in the string you can use 280sv_insert() or sv_insert_flags(). 281 282If you don't need the existing content of the SV, you can avoid some 283copying with: 284 285 SvPVCLEAR(sv); 286 s = SvGROW(sv, needlen + 1); 287 /* something that modifies up to needlen bytes at s, but modifies 288 newlen bytes 289 eg. newlen = read(fd, s, needlen); 290 */ 291 s[newlen] = '\0'; 292 SvCUR_set(sv, newlen); 293 SvPOK_only(sv); /* also clears SVf_UTF8 */ 294 SvSETMAGIC(sv); 295 296Again, if you already have the data in memory or want to avoid the 297complexity of the above, you can use sv_setpvn(). 298 299If you have a buffer allocated with Newx() and want to set that as the 300SV's value, you can use sv_usepvn_flags(). That has some requirements 301if you want to avoid perl re-allocating the buffer to fit the trailing 302NUL: 303 304 Newx(buf, somesize+1, char); 305 /* ... fill in buf ... */ 306 buf[somesize] = '\0'; 307 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); 308 /* buf now belongs to perl, don't release it */ 309 310If you have an SV and want to know what kind of data Perl thinks is stored 311in it, you can use the following macros to check the type of SV you have. 312 313 SvIOK(SV*) 314 SvNOK(SV*) 315 SvPOK(SV*) 316 317Be aware that retrieving the numeric value of an SV can set IOK or NOK 318on that SV, even when the SV started as a string. Prior to Perl 3195.36.0 retrieving the string value of an integer could set POK, but 320this can no longer occur. From 5.36.0 this can be used to distinguish 321the original representation of an SV and is intended to make life 322simpler for serializers: 323 324 /* references handled elsewhere */ 325 if (SvIsBOOL(sv)) { 326 /* originally boolean */ 327 ... 328 } 329 else if (SvPOK(sv)) { 330 /* originally a string */ 331 ... 332 } 333 else if (SvNIOK(sv)) { 334 /* originally numeric */ 335 ... 336 } 337 else { 338 /* something special or undef */ 339 } 340 341You can get and set the current length of the string stored in an SV with 342the following macros: 343 344 SvCUR(SV*) 345 SvCUR_set(SV*, I32 val) 346 347You can also get a pointer to the end of the string stored in the SV 348with the macro: 349 350 SvEND(SV*) 351 352But note that these last three macros are valid only if C<SvPOK()> is true. 353 354If you want to append something to the end of string stored in an C<SV*>, 355you can use the following functions: 356 357 void sv_catpv(SV*, const char*); 358 void sv_catpvn(SV*, const char*, STRLEN); 359 void sv_catpvf(SV*, const char*, ...); 360 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, 361 I32, bool); 362 void sv_catsv(SV*, SV*); 363 364The first function calculates the length of the string to be appended by 365using C<strlen>. In the second, you specify the length of the string 366yourself. The third function processes its arguments like C<sprintf> and 367appends the formatted output. The fourth function works like C<vsprintf>. 368You can specify the address and length of an array of SVs instead of the 369va_list argument. The fifth function 370extends the string stored in the first 371SV with the string stored in the second SV. It also forces the second SV 372to be interpreted as a string. 373 374The C<sv_cat*()> functions are not generic enough to operate on values that 375have "magic". See L</Magic Virtual Tables> later in this document. 376 377If you know the name of a scalar variable, you can get a pointer to its SV 378by using the following: 379 380 SV* get_sv("package::varname", 0); 381 382This returns NULL if the variable does not exist. 383 384If you want to know if this variable (or any other SV) is actually C<defined>, 385you can call: 386 387 SvOK(SV*) 388 389The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 390 391Its address can be used whenever an C<SV*> is needed. Make sure that 392you don't try to compare a random sv with C<&PL_sv_undef>. For example 393when interfacing Perl code, it'll work correctly for: 394 395 foo(undef); 396 397But won't work when called as: 398 399 $x = undef; 400 foo($x); 401 402So to repeat always use SvOK() to check whether an sv is defined. 403 404Also you have to be careful when using C<&PL_sv_undef> as a value in 405AVs or HVs (see L</AVs, HVs and undefined values>). 406 407There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 408boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 409addresses can be used whenever an C<SV*> is needed. 410 411Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 412Take this code: 413 414 SV* sv = (SV*) 0; 415 if (I-am-to-return-a-real-value) { 416 sv = sv_2mortal(newSViv(42)); 417 } 418 sv_setsv(ST(0), sv); 419 420This code tries to return a new SV (which contains the value 42) if it should 421return a real value, or undef otherwise. Instead it has returned a NULL 422pointer which, somewhere down the line, will cause a segmentation violation, 423bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 424first line and all will be well. 425 426To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 427call is not necessary (see L</Reference Counts and Mortality>). 428 429=head2 Offsets 430 431Perl provides the function C<sv_chop> to efficiently remove characters 432from the beginning of a string; you give it an SV and a pointer to 433somewhere inside the PV, and it discards everything before the 434pointer. The efficiency comes by means of a little hack: instead of 435actually removing the characters, C<sv_chop> sets the flag C<OOK> 436(offset OK) to signal to other functions that the offset hack is in 437effect, and it moves the PV pointer (called C<SvPVX>) forward 438by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> 439accordingly. (A portion of the space between the old and new PV 440pointers is used to store the count of chopped bytes.) 441 442Hence, at this point, the start of the buffer that we allocated lives 443at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 444into the middle of this allocated storage. 445 446This is best demonstrated by example. Normally copy-on-write will prevent 447the substitution from operator from using this hack, but if you can craft a 448string for which copy-on-write is not possible, you can see it in play. In 449the current implementation, the final byte of a string buffer is used as a 450copy-on-write reference count. If the buffer is not big enough, then 451copy-on-write is skipped. First have a look at an empty string: 452 453 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' 454 SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 455 REFCNT = 1 456 FLAGS = (POK,pPOK) 457 PV = 0x7ffb7bc05b50 ""\0 458 CUR = 0 459 LEN = 10 460 461Notice here the LEN is 10. (It may differ on your platform.) Extend the 462length of the string to one less than 10, and do a substitution: 463 464 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ 465 Dump($a)' 466 SV = PV(0x7ffa04008a70) at 0x7ffa04030390 467 REFCNT = 1 468 FLAGS = (POK,OOK,pPOK) 469 OFFSET = 1 470 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 471 CUR = 8 472 LEN = 9 473 474Here the number of bytes chopped off (1) is shown next as the OFFSET. The 475portion of the string between the "real" and the "fake" beginnings is 476shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 477the fake beginning, not the real one. (The first character of the string 478buffer happens to have changed to "\1" here, not "1", because the current 479implementation stores the offset count in the string buffer. This is 480subject to change.) 481 482Something similar to the offset hack is performed on AVs to enable 483efficient shifting and splicing off the beginning of the array; while 484C<AvARRAY> points to the first element in the array that is visible from 485Perl, C<AvALLOC> points to the real start of the C array. These are 486usually the same, but a C<shift> operation can be carried out by 487increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. 488Again, the location of the real start of the C array only comes into 489play when freeing the array. See C<av_shift> in F<av.c>. 490 491=for apidoc_section $AV 492=for apidoc Amh||AvALLOC|AV* av 493 494=head2 What's Really Stored in an SV? 495 496Recall that the usual method of determining the type of scalar you have is 497to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 498usually these macros will always return TRUE and calling the C<Sv*V> 499macros will do the appropriate conversion of string to integer/double or 500integer/double to string. 501 502If you I<really> need to know if you have an integer, double, or string 503pointer in an SV, you can use the following three macros instead: 504 505 SvIOKp(SV*) 506 SvNOKp(SV*) 507 SvPOKp(SV*) 508 509These will tell you if you truly have an integer, double, or string pointer 510stored in your SV. The "p" stands for private. 511 512There are various ways in which the private and public flags may differ. 513For example, in perl 5.16 and earlier a tied SV may have a valid 514underlying value in the IV slot (so SvIOKp is true), but the data 515should be accessed via the FETCH routine rather than directly, 516so SvIOK is false. (In perl 5.18 onwards, tied scalars use 517the flags the same way as untied scalars.) Another is when 518numeric conversion has occurred and precision has been lost: only the 519private flag is set on 'lossy' values. So when an NV is converted to an 520IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 521 522In general, though, it's best to use the C<Sv*V> macros. 523 524=head2 Working with AVs 525 526There are two ways to create and load an AV. The first method creates an 527empty AV: 528 529 AV* newAV(); 530 531The second method both creates the AV and initially populates it with SVs: 532 533 AV* av_make(SSize_t num, SV **ptr); 534 535The second argument points to an array containing C<num> C<SV*>'s. Once the 536AV has been created, the SVs can be destroyed, if so desired. 537 538Once the AV has been created, the following operations are possible on it: 539 540 void av_push(AV*, SV*); 541 SV* av_pop(AV*); 542 SV* av_shift(AV*); 543 void av_unshift(AV*, SSize_t num); 544 545These should be familiar operations, with the exception of C<av_unshift>. 546This routine adds C<num> elements at the front of the array with the C<undef> 547value. You must then use C<av_store> (described below) to assign values 548to these new elements. 549 550Here are some other functions: 551 552 SSize_t av_top_index(AV*); 553 SV** av_fetch(AV*, SSize_t key, I32 lval); 554 SV** av_store(AV*, SSize_t key, SV* val); 555 556The C<av_top_index> function returns the highest index value in an array (just 557like $#array in Perl). If the array is empty, -1 is returned. The 558C<av_fetch> function returns the value at index C<key>, but if C<lval> 559is non-zero, then C<av_fetch> will store an undef value at that index. 560The C<av_store> function stores the value C<val> at index C<key>, and does 561not increment the reference count of C<val>. Thus the caller is responsible 562for taking care of that, and if C<av_store> returns NULL, the caller will 563have to decrement the reference count to avoid a memory leak. Note that 564C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 565return value. 566 567A few more: 568 569 void av_clear(AV*); 570 void av_undef(AV*); 571 void av_extend(AV*, SSize_t key); 572 573The C<av_clear> function deletes all the elements in the AV* array, but 574does not actually delete the array itself. The C<av_undef> function will 575delete all the elements in the array plus the array itself. The 576C<av_extend> function extends the array so that it contains at least C<key+1> 577elements. If C<key+1> is less than the currently allocated length of the array, 578then nothing is done. 579 580If you know the name of an array variable, you can get a pointer to its AV 581by using the following: 582 583 AV* get_av("package::varname", 0); 584 585This returns NULL if the variable does not exist. 586 587See L</Understanding the Magic of Tied Hashes and Arrays> for more 588information on how to use the array access functions on tied arrays. 589 590=head2 Working with HVs 591 592To create an HV, you use the following routine: 593 594 HV* newHV(); 595 596Once the HV has been created, the following operations are possible on it: 597 598 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 599 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 600 601The C<klen> parameter is the length of the key being passed in (Note that 602you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 603length of the key). The C<val> argument contains the SV pointer to the 604scalar being stored, and C<hash> is the precomputed hash value (zero if 605you want C<hv_store> to calculate it for you). The C<lval> parameter 606indicates whether this fetch is actually a part of a store operation, in 607which case a new undefined value will be added to the HV with the supplied 608key and C<hv_fetch> will return as if the value had already existed. 609 610Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 611C<SV*>. To access the scalar value, you must first dereference the return 612value. However, you should check to make sure that the return value is 613not NULL before dereferencing it. 614 615The first of these two functions checks if a hash table entry exists, and the 616second deletes it. 617 618 bool hv_exists(HV*, const char* key, U32 klen); 619 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 620 621If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 622create and return a mortal copy of the deleted value. 623 624And more miscellaneous functions: 625 626 void hv_clear(HV*); 627 void hv_undef(HV*); 628 629Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 630table but does not actually delete the hash table. The C<hv_undef> deletes 631both the entries and the hash table itself. 632 633Perl keeps the actual data in a linked list of structures with a typedef of HE. 634These contain the actual key and value pointers (plus extra administrative 635overhead). The key is a string pointer; the value is an C<SV*>. However, 636once you have an C<HE*>, to get the actual key and value, use the routines 637specified below. 638 639=for apidoc_section $HV 640=for apidoc Ayh||HE 641 642 I32 hv_iterinit(HV*); 643 /* Prepares starting point to traverse hash table */ 644 HE* hv_iternext(HV*); 645 /* Get the next entry, and return a pointer to a 646 structure that has both the key and value */ 647 char* hv_iterkey(HE* entry, I32* retlen); 648 /* Get the key from an HE structure and also return 649 the length of the key string */ 650 SV* hv_iterval(HV*, HE* entry); 651 /* Return an SV pointer to the value of the HE 652 structure */ 653 SV* hv_iternextsv(HV*, char** key, I32* retlen); 654 /* This convenience routine combines hv_iternext, 655 hv_iterkey, and hv_iterval. The key and retlen 656 arguments are return values for the key and its 657 length. The value is returned in the SV* argument */ 658 659If you know the name of a hash variable, you can get a pointer to its HV 660by using the following: 661 662 HV* get_hv("package::varname", 0); 663 664This returns NULL if the variable does not exist. 665 666The hash algorithm is defined in the C<PERL_HASH> macro: 667 668 PERL_HASH(hash, key, klen) 669 670The exact implementation of this macro varies by architecture and version 671of perl, and the return value may change per invocation, so the value 672is only valid for the duration of a single perl process. 673 674See L</Understanding the Magic of Tied Hashes and Arrays> for more 675information on how to use the hash access functions on tied hashes. 676 677=for apidoc_section $HV 678=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen 679 680=head2 Hash API Extensions 681 682Beginning with version 5.004, the following functions are also supported: 683 684 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 685 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 686 687 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 688 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 689 690 SV* hv_iterkeysv (HE* entry); 691 692Note that these functions take C<SV*> keys, which simplifies writing 693of extension code that deals with hash structures. These functions 694also allow passing of C<SV*> keys to C<tie> functions without forcing 695you to stringify the keys (unlike the previous set of functions). 696 697They also return and accept whole hash entries (C<HE*>), making their 698use more efficient (since the hash number for a particular string 699doesn't have to be recomputed every time). See L<perlapi> for detailed 700descriptions. 701 702The following macros must always be used to access the contents of hash 703entries. Note that the arguments to these macros must be simple 704variables, since they may get evaluated more than once. See 705L<perlapi> for detailed descriptions of these macros. 706 707 HePV(HE* he, STRLEN len) 708 HeVAL(HE* he) 709 HeHASH(HE* he) 710 HeSVKEY(HE* he) 711 HeSVKEY_force(HE* he) 712 HeSVKEY_set(HE* he, SV* sv) 713 714These two lower level macros are defined, but must only be used when 715dealing with keys that are not C<SV*>s: 716 717 HeKEY(HE* he) 718 HeKLEN(HE* he) 719 720Note that both C<hv_store> and C<hv_store_ent> do not increment the 721reference count of the stored C<val>, which is the caller's responsibility. 722If these functions return a NULL value, the caller will usually have to 723decrement the reference count of C<val> to avoid a memory leak. 724 725=head2 AVs, HVs and undefined values 726 727Sometimes you have to store undefined values in AVs or HVs. Although 728this may be a rare case, it can be tricky. That's because you're 729used to using C<&PL_sv_undef> if you need an undefined SV. 730 731For example, intuition tells you that this XS code: 732 733 AV *av = newAV(); 734 av_store( av, 0, &PL_sv_undef ); 735 736is equivalent to this Perl code: 737 738 my @av; 739 $av[0] = undef; 740 741Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker 742for indicating that an array element has not yet been initialized. 743Thus, C<exists $av[0]> would be true for the above Perl code, but 744false for the array generated by the XS code. In perl 5.20, storing 745&PL_sv_undef will create a read-only element, because the scalar 746&PL_sv_undef itself is stored, not a copy. 747 748Similar problems can occur when storing C<&PL_sv_undef> in HVs: 749 750 hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 751 752This will indeed make the value C<undef>, but if you try to modify 753the value of C<key>, you'll get the following error: 754 755 Modification of non-creatable hash value attempted 756 757In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 758in restricted hashes. This caused such hash entries not to appear 759when iterating over the hash or when checking for the keys 760with the C<hv_exists> function. 761 762You can run into similar problems when you store C<&PL_sv_yes> or 763C<&PL_sv_no> into AVs or HVs. Trying to modify such elements 764will give you the following error: 765 766 Modification of a read-only value attempted 767 768To make a long story short, you can use the special variables 769C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and 770HVs, but you have to make sure you know what you're doing. 771 772Generally, if you want to store an undefined value in an AV 773or HV, you should not use C<&PL_sv_undef>, but rather create a 774new undefined value using the C<newSV> function, for example: 775 776 av_store( av, 42, newSV(0) ); 777 hv_store( hv, "foo", 3, newSV(0), 0 ); 778 779=head2 References 780 781References are a special type of scalar that point to other data types 782(including other references). 783 784To create a reference, use either of the following functions: 785 786 SV* newRV_inc((SV*) thing); 787 SV* newRV_noinc((SV*) thing); 788 789The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 790functions are identical except that C<newRV_inc> increments the reference 791count of the C<thing>, while C<newRV_noinc> does not. For historical 792reasons, C<newRV> is a synonym for C<newRV_inc>. 793 794Once you have a reference, you can use the following macro to dereference 795the reference: 796 797 SvRV(SV*) 798 799then call the appropriate routines, casting the returned C<SV*> to either an 800C<AV*> or C<HV*>, if required. 801 802To determine if an SV is a reference, you can use the following macro: 803 804 SvROK(SV*) 805 806To discover what type of value the reference refers to, use the following 807macro and then check the return value. 808 809 SvTYPE(SvRV(SV*)) 810 811The most useful types that will be returned are: 812 813 SVt_PVAV Array 814 SVt_PVHV Hash 815 SVt_PVCV Code 816 SVt_PVGV Glob (possibly a file handle) 817 818Any numerical value returned which is less than SVt_PVAV will be a scalar 819of some form. 820 821See L<perlapi/svtype> for more details. 822 823=head2 Blessed References and Class Objects 824 825References are also used to support object-oriented programming. In perl's 826OO lexicon, an object is simply a reference that has been blessed into a 827package (or class). Once blessed, the programmer may now use the reference 828to access the various methods in the class. 829 830A reference can be blessed into a package with the following function: 831 832 SV* sv_bless(SV* sv, HV* stash); 833 834The C<sv> argument must be a reference value. The C<stash> argument 835specifies which class the reference will belong to. See 836L</Stashes and Globs> for information on converting class names into stashes. 837 838/* Still under construction */ 839 840The following function upgrades rv to reference if not already one. 841Creates a new SV for rv to point to. If C<classname> is non-null, the SV 842is blessed into the specified class. SV is returned. 843 844 SV* newSVrv(SV* rv, const char* classname); 845 846The following three functions copy integer, unsigned integer or double 847into an SV whose reference is C<rv>. SV is blessed if C<classname> is 848non-null. 849 850 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 851 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 852 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 853 854The following function copies the pointer value (I<the address, not the 855string!>) into an SV whose reference is rv. SV is blessed if C<classname> 856is non-null. 857 858 SV* sv_setref_pv(SV* rv, const char* classname, void* pv); 859 860The following function copies a string into an SV whose reference is C<rv>. 861Set length to 0 to let Perl calculate the string length. SV is blessed if 862C<classname> is non-null. 863 864 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, 865 STRLEN length); 866 867The following function tests whether the SV is blessed into the specified 868class. It does not check inheritance relationships. 869 870 int sv_isa(SV* sv, const char* name); 871 872The following function tests whether the SV is a reference to a blessed object. 873 874 int sv_isobject(SV* sv); 875 876The following function tests whether the SV is derived from the specified 877class. SV can be either a reference to a blessed object or a string 878containing a class name. This is the function implementing the 879C<UNIVERSAL::isa> functionality. 880 881 bool sv_derived_from(SV* sv, const char* name); 882 883To check if you've got an object derived from a specific class you have 884to write: 885 886 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 887 888=head2 Creating New Variables 889 890To create a new Perl variable with an undef value which can be accessed from 891your Perl script, use the following routines, depending on the variable type. 892 893 SV* get_sv("package::varname", GV_ADD); 894 AV* get_av("package::varname", GV_ADD); 895 HV* get_hv("package::varname", GV_ADD); 896 897Notice the use of GV_ADD as the second parameter. The new variable can now 898be set, using the routines appropriate to the data type. 899 900There are additional macros whose values may be bitwise OR'ed with the 901C<GV_ADD> argument to enable certain extra features. Those bits are: 902 903=over 904 905=item GV_ADDMULTI 906 907Marks the variable as multiply defined, thus preventing the: 908 909 Name <varname> used only once: possible typo 910 911warning. 912 913=item GV_ADDWARN 914 915Issues the warning: 916 917 Had to create <varname> unexpectedly 918 919if the variable did not exist before the function was called. 920 921=back 922 923If you do not specify a package name, the variable is created in the current 924package. 925 926=head2 Reference Counts and Mortality 927 928Perl uses a reference count-driven garbage collection mechanism. SVs, 929AVs, or HVs (xV for short in the following) start their life with a 930reference count of 1. If the reference count of an xV ever drops to 0, 931then it will be destroyed and its memory made available for reuse. 932At the most basic internal level, reference counts can be manipulated 933with the following macros: 934 935 int SvREFCNT(SV* sv); 936 SV* SvREFCNT_inc(SV* sv); 937 void SvREFCNT_dec(SV* sv); 938 939(There are also suffixed versions of the increment and decrement macros, 940for situations where the full generality of these basic macros can be 941exchanged for some performance.) 942 943However, the way a programmer should think about references is not so 944much in terms of the bare reference count, but in terms of I<ownership> 945of references. A reference to an xV can be owned by any of a variety 946of entities: another xV, the Perl interpreter, an XS data structure, 947a piece of running code, or a dynamic scope. An xV generally does not 948know what entities own the references to it; it only knows how many 949references there are, which is the reference count. 950 951To correctly maintain reference counts, it is essential to keep track 952of what references the XS code is manipulating. The programmer should 953always know where a reference has come from and who owns it, and be 954aware of any creation or destruction of references, and any transfers 955of ownership. Because ownership isn't represented explicitly in the xV 956data structures, only the reference count need be actually maintained 957by the code, and that means that this understanding of ownership is not 958actually evident in the code. For example, transferring ownership of a 959reference from one owner to another doesn't change the reference count 960at all, so may be achieved with no actual code. (The transferring code 961doesn't touch the referenced object, but does need to ensure that the 962former owner knows that it no longer owns the reference, and that the 963new owner knows that it now does.) 964 965An xV that is visible at the Perl level should not become unreferenced 966and thus be destroyed. Normally, an object will only become unreferenced 967when it is no longer visible, often by the same means that makes it 968invisible. For example, a Perl reference value (RV) owns a reference to 969its referent, so if the RV is overwritten that reference gets destroyed, 970and the no-longer-reachable referent may be destroyed as a result. 971 972Many functions have some kind of reference manipulation as 973part of their purpose. Sometimes this is documented in terms 974of ownership of references, and sometimes it is (less helpfully) 975documented in terms of changes to reference counts. For example, the 976L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV 977(with reference count 1) and increment the reference count of the referent 978that was supplied by the caller. This is best understood as creating 979a new reference to the referent, which is owned by the created RV, 980and returning to the caller ownership of the sole reference to the RV. 981The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not 982increment the reference count of the referent, but the RV nevertheless 983ends up owning a reference to the referent. It is therefore implied 984that the caller of C<newRV_noinc()> is relinquishing a reference to the 985referent, making this conceptually a more complicated operation even 986though it does less to the data structures. 987 988For example, imagine you want to return a reference from an XSUB 989function. Inside the XSUB routine, you create an SV which initially 990has just a single reference, owned by the XSUB routine. This reference 991needs to be disposed of before the routine is complete, otherwise it 992will leak, preventing the SV from ever being destroyed. So to create 993an RV referencing the SV, it is most convenient to pass the SV to 994C<newRV_noinc()>, which consumes that reference. Now the XSUB routine 995no longer owns a reference to the SV, but does own a reference to the RV, 996which in turn owns a reference to the SV. The ownership of the reference 997to the RV is then transferred by the process of returning the RV from 998the XSUB. 999 1000There are some convenience functions available that can help with the 1001destruction of xVs. These functions introduce the concept of "mortality". 1002Much documentation speaks of an xV itself being mortal, but this is 1003misleading. It is really I<a reference to> an xV that is mortal, and it 1004is possible for there to be more than one mortal reference to a single xV. 1005For a reference to be mortal means that it is owned by the temps stack, 1006one of perl's many internal stacks, which will destroy that reference 1007"a short time later". Usually the "short time later" is the end of 1008the current Perl statement. However, it gets more complicated around 1009dynamic scopes: there can be multiple sets of mortal references hanging 1010around at the same time, with different death dates. Internally, the 1011actual determinant for when mortal xV references are destroyed depends 1012on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> 1013and L</Temporaries Stack> below for more details on these macros. 1014 1015Mortal references are mainly used for xVs that are placed on perl's 1016main stack. The stack is problematic for reference tracking, because it 1017contains a lot of xV references, but doesn't own those references: they 1018are not counted. Currently, there are many bugs resulting from xVs being 1019destroyed while referenced by the stack, because the stack's uncounted 1020references aren't enough to keep the xVs alive. So when putting an 1021(uncounted) reference on the stack, it is vitally important to ensure that 1022there will be a counted reference to the same xV that will last at least 1023as long as the uncounted reference. But it's also important that that 1024counted reference be cleaned up at an appropriate time, and not unduly 1025prolong the xV's life. For there to be a mortal reference is often the 1026best way to satisfy this requirement, especially if the xV was created 1027especially to be put on the stack and would otherwise be unreferenced. 1028 1029To create a mortal reference, use the functions: 1030 1031 SV* sv_newmortal() 1032 SV* sv_mortalcopy(SV*) 1033 SV* sv_2mortal(SV*) 1034 1035C<sv_newmortal()> creates an SV (with the undefined value) whose sole 1036reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a 1037copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> 1038mortalises an existing xV reference: it transfers ownership of a reference 1039from the caller to the temps stack. Because C<sv_newmortal> gives the new 1040SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, 1041etc. : 1042 1043 SV *tmp = sv_newmortal(); 1044 sv_setiv(tmp, an_integer); 1045 1046As that is multiple C statements it is quite common so see this idiom instead: 1047 1048 SV *tmp = sv_2mortal(newSViv(an_integer)); 1049 1050The mortal routines are not just for SVs; AVs and HVs can be 1051made mortal by passing their address (type-casted to C<SV*>) to the 1052C<sv_2mortal> or C<sv_mortalcopy> routines. 1053 1054=head2 Stashes and Globs 1055 1056A B<stash> is a hash that contains all variables that are defined 1057within a package. Each key of the stash is a symbol 1058name (shared by all the different types of objects that have the same 1059name), and each value in the hash table is a GV (Glob Value). This GV 1060in turn contains references to the various objects of that name, 1061including (but not limited to) the following: 1062 1063 Scalar Value 1064 Array Value 1065 Hash Value 1066 I/O Handle 1067 Format 1068 Subroutine 1069 1070There is a single stash called C<PL_defstash> that holds the items that exist 1071in the C<main> package. To get at the items in other packages, append the 1072string "::" to the package name. The items in the C<Foo> package are in 1073the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 1074in the stash C<Baz::> in C<Bar::>'s stash. 1075 1076=for apidoc_section $GV 1077=for apidoc Amnh||PL_defstash 1078 1079To get the stash pointer for a particular package, use the function: 1080 1081 HV* gv_stashpv(const char* name, I32 flags) 1082 HV* gv_stashsv(SV*, I32 flags) 1083 1084The first function takes a literal string, the second uses the string stored 1085in the SV. Remember that a stash is just a hash table, so you get back an 1086C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. 1087 1088The name that C<gv_stash*v> wants is the name of the package whose symbol table 1089you want. The default package is called C<main>. If you have multiply nested 1090packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 1091language itself. 1092 1093Alternately, if you have an SV that is a blessed reference, you can find 1094out the stash pointer by using: 1095 1096 HV* SvSTASH(SvRV(SV*)); 1097 1098then use the following to get the package name itself: 1099 1100 char* HvNAME(HV* stash); 1101 1102If you need to bless or re-bless an object you can use the following 1103function: 1104 1105 SV* sv_bless(SV*, HV* stash) 1106 1107where the first argument, an C<SV*>, must be a reference, and the second 1108argument is a stash. The returned C<SV*> can now be used in the same way 1109as any other SV. 1110 1111For more information on references and blessings, consult L<perlref>. 1112 1113=head2 I/O Handles 1114 1115Like AVs and HVs, IO objects are another type of non-scalar SV which 1116may contain input and output L<PerlIO|perlapio> objects or a C<DIR *> 1117from opendir(). 1118 1119You can create a new IO object: 1120 1121 IO* newIO(); 1122 1123Unlike other SVs, a new IO object is automatically blessed into the 1124L<IO::File> class. 1125 1126The IO object contains an input and output PerlIO handle: 1127 1128 PerlIO *IoIFP(IO *io); 1129 PerlIO *IoOFP(IO *io); 1130 1131=for apidoc_section $io 1132=for apidoc Amh|PerlIO *|IoIFP|IO *io 1133=for apidoc Amh|PerlIO *|IoOFP|IO *io 1134 1135Typically if the IO object has been opened on a file, the input handle 1136is always present, but the output handle is only present if the file 1137is open for output. For a file, if both are present they will be the 1138same PerlIO object. 1139 1140Distinct input and output PerlIO objects are created for sockets and 1141character devices. 1142 1143The IO object also contains other data associated with Perl I/O 1144handles: 1145 1146 IV IoLINES(io); /* $. */ 1147 IV IoPAGE(io); /* $% */ 1148 IV IoPAGE_LEN(io); /* $= */ 1149 IV IoLINES_LEFT(io); /* $- */ 1150 char *IoTOP_NAME(io); /* $^ */ 1151 GV *IoTOP_GV(io); /* $^ */ 1152 char *IoFMT_NAME(io); /* $~ */ 1153 GV *IoFMT_GV(io); /* $~ */ 1154 char *IoBOTTOM_NAME(io); 1155 GV *IoBOTTOM_GV(io); 1156 char IoTYPE(io); 1157 U8 IoFLAGS(io); 1158 1159 =for apidoc_sections $io_scn, $formats_section 1160=for apidoc_section $reports 1161=for apidoc Amh|IV|IoLINES|IO *io 1162=for apidoc Amh|IV|IoPAGE|IO *io 1163=for apidoc Amh|IV|IoPAGE_LEN|IO *io 1164=for apidoc Amh|IV|IoLINES_LEFT|IO *io 1165=for apidoc Amh|char *|IoTOP_NAME|IO *io 1166=for apidoc Amh|GV *|IoTOP_GV|IO *io 1167=for apidoc Amh|char *|IoFMT_NAME|IO *io 1168=for apidoc Amh|GV *|IoFMT_GV|IO *io 1169=for apidoc Amh|char *|IoBOTTOM_NAME|IO *io 1170=for apidoc Amh|GV *|IoBOTTOM_GV|IO *io 1171=for apidoc_section $io 1172=for apidoc Amh|char|IoTYPE|IO *io 1173=for apidoc Amh|U8|IoFLAGS|IO *io 1174 1175Most of these are involved with L<formats|perlform>. 1176 1177IoFLAGs() may contain a combination of flags, the most interesting of 1178which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>, 1179settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>. 1180 1181=for apidoc Amnh||IOf_FLUSH 1182=for apidoc Amnh||IOf_UNTAINT 1183 1184The IO object may also contains a directory handle: 1185 1186 DIR *IoDIRP(io); 1187 1188=for apidoc Amh|DIR *|IoDIRP|IO *io 1189 1190suitable for use with PerlDir_read() etc. 1191 1192All of these accessors macros are lvalues, there are no distinct 1193C<_set()> macros to modify the members of the IO object. 1194 1195=head2 Double-Typed SVs 1196 1197Scalar variables normally contain only one type of value, an integer, 1198double, pointer, or reference. Perl will automatically convert the 1199actual scalar data from the stored type into the requested type. 1200 1201Some scalar variables contain more than one type of scalar data. For 1202example, the variable C<$!> contains either the numeric value of C<errno> 1203or its string equivalent from either C<strerror> or C<sys_errlist[]>. 1204 1205To force multiple data values into an SV, you must do two things: use the 1206C<sv_set*v> routines to add the additional scalar type, then set a flag 1207so that Perl will believe it contains more than one type of data. The 1208four macros to set the flags are: 1209 1210 SvIOK_on 1211 SvNOK_on 1212 SvPOK_on 1213 SvROK_on 1214 1215The particular macro you must use depends on which C<sv_set*v> routine 1216you called first. This is because every C<sv_set*v> routine turns on 1217only the bit for the particular type of data being set, and turns off 1218all the rest. 1219 1220For example, to create a new Perl variable called "dberror" that contains 1221both the numeric and descriptive string error values, you could use the 1222following code: 1223 1224 extern int dberror; 1225 extern char *dberror_list; 1226 1227 SV* sv = get_sv("dberror", GV_ADD); 1228 sv_setiv(sv, (IV) dberror); 1229 sv_setpv(sv, dberror_list[dberror]); 1230 SvIOK_on(sv); 1231 1232If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 1233macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 1234 1235=head2 Read-Only Values 1236 1237In Perl 5.16 and earlier, copy-on-write (see the next section) shared a 1238flag bit with read-only scalars. So the only way to test whether 1239C<sv_setsv>, etc., will raise a "Modification of a read-only value" error 1240in those versions is: 1241 1242 SvREADONLY(sv) && !SvIsCOW(sv) 1243 1244Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, 1245and, under 5.20, copy-on-write scalars can also be read-only, so the above 1246check is incorrect. You just want: 1247 1248 SvREADONLY(sv) 1249 1250If you need to do this check often, define your own macro like this: 1251 1252 #if PERL_VERSION >= 18 1253 # define SvTRULYREADONLY(sv) SvREADONLY(sv) 1254 #else 1255 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) 1256 #endif 1257 1258=head2 Copy on Write 1259 1260Perl implements a copy-on-write (COW) mechanism for scalars, in which 1261string copies are not immediately made when requested, but are deferred 1262until made necessary by one or the other scalar changing. This is mostly 1263transparent, but one must take care not to modify string buffers that are 1264shared by multiple SVs. 1265 1266You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. 1267 1268You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). 1269 1270If you want to make the SV drop its string buffer, use 1271C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply 1272C<sv_setsv(sv, NULL)>. 1273 1274All of these functions will croak on read-only scalars (see the previous 1275section for more on those). 1276 1277To test that your code is behaving correctly and not modifying COW buffers, 1278on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with 1279C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations 1280into crashes. You will find it to be marvellously slow, so you may want to 1281skip perl's own tests. 1282 1283=head2 Magic Variables 1284 1285[This section still under construction. Ignore everything here. Post no 1286bills. Everything not permitted is forbidden.] 1287 1288Any SV may be magical, that is, it has special features that a normal 1289SV does not have. These features are stored in the SV structure in a 1290linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 1291 1292 struct magic { 1293 MAGIC* mg_moremagic; 1294 MGVTBL* mg_virtual; 1295 U16 mg_private; 1296 char mg_type; 1297 U8 mg_flags; 1298 I32 mg_len; 1299 SV* mg_obj; 1300 char* mg_ptr; 1301 }; 1302 1303Note this is current as of patchlevel 0, and could change at any time. 1304 1305=head2 Assigning Magic 1306 1307Perl adds magic to an SV using the sv_magic function: 1308 1309 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 1310 1311The C<sv> argument is a pointer to the SV that is to acquire a new magical 1312feature. 1313 1314If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 1315convert C<sv> to type C<SVt_PVMG>. 1316Perl then continues by adding new magic 1317to the beginning of the linked list of magical features. Any prior entry 1318of the same type of magic is deleted. Note that this can be overridden, 1319and multiple instances of the same type of magic can be associated with an 1320SV. 1321 1322The C<name> and C<namlen> arguments are used to associate a string with 1323the magic, typically the name of a variable. C<namlen> is stored in the 1324C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of 1325C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on 1326whether C<namlen> is greater than zero or equal to zero respectively. As a 1327special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed 1328to contain an C<SV*> and is stored as-is with its REFCNT incremented. 1329 1330The sv_magic function uses C<how> to determine which, if any, predefined 1331"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 1332See the L</Magic Virtual Tables> section below. The C<how> argument is also 1333stored in the C<mg_type> field. The value of 1334C<how> should be chosen from the set of macros 1335C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 1336these macros were added, Perl internals used to directly use character 1337literals, so you may occasionally come across old code or documentation 1338referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 1339 1340The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 1341structure. If it is not the same as the C<sv> argument, the reference 1342count of the C<obj> object is incremented. If it is the same, or if 1343the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, 1344C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely 1345stored, without the reference count being incremented. 1346 1347See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic 1348to an SV. 1349 1350There is also a function to add magic to an C<HV>: 1351 1352 void hv_magic(HV *hv, GV *gv, int how); 1353 1354This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 1355 1356To remove the magic from an SV, call the function sv_unmagic: 1357 1358 int sv_unmagic(SV *sv, int type); 1359 1360The C<type> argument should be equal to the C<how> value when the C<SV> 1361was initially made magical. 1362 1363However, note that C<sv_unmagic> removes all magic of a certain C<type> from the 1364C<SV>. If you want to remove only certain 1365magic of a C<type> based on the magic 1366virtual table, use C<sv_unmagicext> instead: 1367 1368 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); 1369 1370=head2 Magic Virtual Tables 1371 1372The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 1373C<MGVTBL>, which is a structure of function pointers and stands for 1374"Magic Virtual Table" to handle the various operations that might be 1375applied to that variable. 1376 1377=for apidoc_section $magic 1378=for apidoc Ayh||MGVTBL 1379 1380The C<MGVTBL> has five (or sometimes eight) pointers to the following 1381routine types: 1382 1383 int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); 1384 int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); 1385 U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); 1386 int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); 1387 int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); 1388 1389 int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, 1390 const char *name, I32 namlen); 1391 int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); 1392 int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); 1393 1394 1395This MGVTBL structure is set at compile-time in F<perl.h> and there are 1396currently 32 types. These different structures contain pointers to various 1397routines that perform additional actions depending on which function is 1398being called. 1399 1400 Function pointer Action taken 1401 ---------------- ------------ 1402 svt_get Do something before the value of the SV is 1403 retrieved. 1404 svt_set Do something after the SV is assigned a value. 1405 svt_len Report on the SV's length. 1406 svt_clear Clear something the SV represents. 1407 svt_free Free any extra storage associated with the SV. 1408 1409 svt_copy copy tied variable magic to a tied element 1410 svt_dup duplicate a magic structure during thread cloning 1411 svt_local copy magic to local value during 'local' 1412 1413For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 1414to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 1415 1416 { magic_get, magic_set, magic_len, 0, 0 } 1417 1418Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 1419if a get operation is being performed, the routine C<magic_get> is 1420called. All the various routines for the various magical types begin 1421with C<magic_>. NOTE: the magic routines are not considered part of 1422the Perl API, and may not be exported by the Perl library. 1423 1424The last three slots are a recent addition, and for source code 1425compatibility they are only checked for if one of the three flags 1426C<MGf_COPY>, C<MGf_DUP>, or C<MGf_LOCAL> is set in mg_flags. 1427This means that most code can continue declaring 1428a vtable as a 5-element value. These three are 1429currently used exclusively by the threading code, and are highly subject 1430to change. 1431 1432=for apidoc_section $magic 1433=for apidoc Amnh||MGf_COPY 1434=for apidoc_item ||MGf_DUP 1435=for apidoc_item ||MGf_LOCAL 1436 1437The current kinds of Magic Virtual Tables are: 1438 1439=for comment 1440This table is generated by regen/mg_vtable.pl. Any changes made here 1441will be lost. 1442 1443=for mg_vtable.pl begin 1444 1445 mg_type 1446 (old-style char and macro) MGVTBL Type of magic 1447 -------------------------- ------ ------------- 1448 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 1449 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1450 % PERL_MAGIC_rhash (none) Extra data for restricted 1451 hashes 1452 * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace 1453 vars 1454 . PERL_MAGIC_pos vtbl_pos pos() lvalue 1455 : PERL_MAGIC_symtab (none) Extra data for symbol 1456 tables 1457 < PERL_MAGIC_backref vtbl_backref For weak ref data 1458 @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV 1459 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore 1460 (fast string search) 1461 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table 1462 (AMT) on stash 1463 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 1464 (@+ and @- vars) 1465 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1466 element 1467 E PERL_MAGIC_env vtbl_env %ENV hash 1468 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1469 f PERL_MAGIC_fm vtbl_regexp Formline 1470 ('compiled' format) 1471 g PERL_MAGIC_regex_global vtbl_mglob m//g target 1472 H PERL_MAGIC_hints vtbl_hints %^H hash 1473 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element 1474 I PERL_MAGIC_isa vtbl_isa @ISA array 1475 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1476 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1477 L PERL_MAGIC_dbfile (none) Debugger %_<filename 1478 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename 1479 element 1480 N PERL_MAGIC_shared (none) Shared between threads 1481 n PERL_MAGIC_shared_scalar (none) Shared between threads 1482 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation 1483 P PERL_MAGIC_tied vtbl_pack Tied array or hash 1484 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1485 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1486 r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex 1487 S PERL_MAGIC_sig vtbl_sig %SIG hash 1488 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1489 t PERL_MAGIC_taint vtbl_taint Taintedness 1490 U PERL_MAGIC_uvar vtbl_uvar Available for use by 1491 extensions 1492 u PERL_MAGIC_uvar_elem (none) Reserved for use by 1493 extensions 1494 V PERL_MAGIC_vstring (none) SV was vstring literal 1495 v PERL_MAGIC_vec vtbl_vec vec() lvalue 1496 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information 1497 x PERL_MAGIC_substr vtbl_substr substr() lvalue 1498 Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not 1499 exist 1500 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1501 variable / smart parameter 1502 vivification 1503 \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference 1504 constructor 1505 ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call 1506 to this CV 1507 ~ PERL_MAGIC_ext (none) Available for use by 1508 extensions 1509 1510 1511=for apidoc_section $magic 1512=for apidoc AmnhU||PERL_MAGIC_arylen 1513=for apidoc_item ||PERL_MAGIC_arylen_p 1514=for apidoc_item ||PERL_MAGIC_backref 1515=for apidoc_item ||PERL_MAGIC_bm 1516=for apidoc_item ||PERL_MAGIC_checkcall 1517=for apidoc_item ||PERL_MAGIC_collxfrm 1518=for apidoc_item ||PERL_MAGIC_dbfile 1519=for apidoc_item ||PERL_MAGIC_dbline 1520=for apidoc_item ||PERL_MAGIC_debugvar 1521=for apidoc_item ||PERL_MAGIC_defelem 1522=for apidoc_item ||PERL_MAGIC_env 1523=for apidoc_item ||PERL_MAGIC_envelem 1524=for apidoc_item ||PERL_MAGIC_ext 1525=for apidoc_item ||PERL_MAGIC_fm 1526=for apidoc_item ||PERL_MAGIC_hints 1527=for apidoc_item ||PERL_MAGIC_hintselem 1528=for apidoc_item ||PERL_MAGIC_isa 1529=for apidoc_item ||PERL_MAGIC_isaelem 1530=for apidoc_item ||PERL_MAGIC_lvref 1531=for apidoc_item ||PERL_MAGIC_nkeys 1532=for apidoc_item ||PERL_MAGIC_nonelem 1533=for apidoc_item ||PERL_MAGIC_overload_table 1534=for apidoc_item ||PERL_MAGIC_pos 1535=for apidoc_item ||PERL_MAGIC_qr 1536=for apidoc_item ||PERL_MAGIC_regdata 1537=for apidoc_item ||PERL_MAGIC_regdatum 1538=for apidoc_item ||PERL_MAGIC_regex_global 1539=for apidoc_item ||PERL_MAGIC_rhash 1540=for apidoc_item ||PERL_MAGIC_shared 1541=for apidoc_item ||PERL_MAGIC_shared_scalar 1542=for apidoc_item ||PERL_MAGIC_sig 1543=for apidoc_item ||PERL_MAGIC_sigelem 1544=for apidoc_item ||PERL_MAGIC_substr 1545=for apidoc_item ||PERL_MAGIC_sv 1546=for apidoc_item ||PERL_MAGIC_symtab 1547=for apidoc_item ||PERL_MAGIC_taint 1548=for apidoc_item ||PERL_MAGIC_tied 1549=for apidoc_item ||PERL_MAGIC_tiedelem 1550=for apidoc_item ||PERL_MAGIC_tiedscalar 1551=for apidoc_item ||PERL_MAGIC_utf8 1552=for apidoc_item ||PERL_MAGIC_uvar 1553=for apidoc_item ||PERL_MAGIC_uvar_elem 1554=for apidoc_item ||PERL_MAGIC_vec 1555=for apidoc_item ||PERL_MAGIC_vstring 1556 1557=for mg_vtable.pl end 1558 1559When an uppercase and lowercase letter both exist in the table, then the 1560uppercase letter is typically used to represent some kind of composite type 1561(a list or a hash), and the lowercase letter is used to represent an element 1562of that composite type. Some internals code makes use of this case 1563relationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1564 1565The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined 1566specifically for use by extensions and will not be used by perl itself. 1567Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information 1568to variables (typically objects). This is especially useful because 1569there is no way for normal perl code to corrupt this private information 1570(unlike using extra elements of a hash object). 1571 1572Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1573C function any time a scalar's value is used or changed. The C<MAGIC>'s 1574C<mg_ptr> field points to a C<ufuncs> structure: 1575 1576 struct ufuncs { 1577 I32 (*uf_val)(pTHX_ IV, SV*); 1578 I32 (*uf_set)(pTHX_ IV, SV*); 1579 IV uf_index; 1580 }; 1581 1582When the SV is read from or written to, the C<uf_val> or C<uf_set> 1583function will be called with C<uf_index> as the first arg and a pointer to 1584the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1585magic is shown below. Note that the ufuncs structure is copied by 1586sv_magic, so you can safely allocate it on the stack. 1587 1588 void 1589 Umagic(sv) 1590 SV *sv; 1591 PREINIT: 1592 struct ufuncs uf; 1593 CODE: 1594 uf.uf_val = &my_get_fn; 1595 uf.uf_set = &my_set_fn; 1596 uf.uf_index = 0; 1597 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1598 1599Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. 1600 1601For hashes there is a specialized hook that gives control over hash 1602keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic 1603if the "set" function in the C<ufuncs> structure is NULL. The hook 1604is activated whenever the hash is accessed with a key specified as 1605an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, 1606C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string 1607through the functions without the C<..._ent> suffix circumvents the 1608hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. 1609 1610Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1611or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1612extra care to avoid conflict. Typically only using the magic on 1613objects blessed into the same class as the extension is sufficient. 1614For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an 1615C<MGVTBL>, even if all its fields will be C<0>, so that individual 1616C<MAGIC> pointers can be identified as a particular kind of magic 1617using their magic virtual table. C<mg_findext> provides an easy way 1618to do that: 1619 1620 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; 1621 1622 MAGIC *mg; 1623 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { 1624 /* this is really ours, not another module's PERL_MAGIC_ext */ 1625 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; 1626 ... 1627 } 1628 1629Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1630earlier do B<not> invoke 'set' magic on their targets. This must 1631be done by the user either by calling the C<SvSETMAGIC()> macro after 1632calling these functions, or by using one of the C<sv_set*_mg()> or 1633C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1634C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1635obtained from external sources in functions that don't handle magic. 1636See L<perlapi> for a description of these functions. 1637For example, calls to the C<sv_cat*()> functions typically need to be 1638followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1639since their implementation handles 'get' magic. 1640 1641=head2 Finding Magic 1642 1643 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that 1644 * type */ 1645 1646This routine returns a pointer to a C<MAGIC> structure stored in the SV. 1647If the SV does not have that magical 1648feature, C<NULL> is returned. If the 1649SV has multiple instances of that magical feature, the first one will be 1650returned. C<mg_findext> can be used 1651to find a C<MAGIC> structure of an SV 1652based on both its magic type and its magic virtual table: 1653 1654 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); 1655 1656Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type 1657SVt_PVMG, Perl may core dump. 1658 1659 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1660 1661This routine checks to see what types of magic C<sv> has. If the mg_type 1662field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1663the mg_type field is changed to be the lowercase letter. 1664 1665=head2 Understanding the Magic of Tied Hashes and Arrays 1666 1667Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1668magic type. 1669 1670WARNING: As of the 5.004 release, proper usage of the array and hash 1671access functions requires understanding a few caveats. Some 1672of these caveats are actually considered bugs in the API, to be fixed 1673in later releases, and are bracketed with [MAYCHANGE] below. If 1674you find yourself actually applying such information in this section, be 1675aware that the behavior may change in the future, umm, without warning. 1676 1677The perl tie function associates a variable with an object that implements 1678the various GET, SET, etc methods. To perform the equivalent of the perl 1679tie function from an XSUB, you must mimic this behaviour. The code below 1680carries out the necessary steps -- firstly it creates a new hash, and then 1681creates a second hash which it blesses into the class which will implement 1682the tie methods. Lastly it ties the two hashes together, and returns a 1683reference to the new tied hash. Note that the code below does NOT call the 1684TIEHASH method in the MyTie class - 1685see L</Calling Perl Routines from within C Programs> for details on how 1686to do this. 1687 1688 SV* 1689 mytie() 1690 PREINIT: 1691 HV *hash; 1692 HV *stash; 1693 SV *tie; 1694 CODE: 1695 hash = newHV(); 1696 tie = newRV_noinc((SV*)newHV()); 1697 stash = gv_stashpv("MyTie", GV_ADD); 1698 sv_bless(tie, stash); 1699 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1700 RETVAL = newRV_noinc(hash); 1701 OUTPUT: 1702 RETVAL 1703 1704The C<av_store> function, when given a tied array argument, merely 1705copies the magic of the array onto the value to be "stored", using 1706C<mg_copy>. It may also return NULL, indicating that the value did not 1707actually need to be stored in the array. [MAYCHANGE] After a call to 1708C<av_store> on a tied array, the caller will usually need to call 1709C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1710TIEARRAY object. If C<av_store> did return NULL, a call to 1711C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1712leak. [/MAYCHANGE] 1713 1714The previous paragraph is applicable verbatim to tied hash access using the 1715C<hv_store> and C<hv_store_ent> functions as well. 1716 1717C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1718C<hv_fetch_ent> actually return an undefined mortal value whose magic 1719has been initialized using C<mg_copy>. Note the value so returned does not 1720need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1721need to call C<mg_get()> on the returned value in order to actually invoke 1722the perl level "FETCH" method on the underlying TIE object. Similarly, 1723you may also call C<mg_set()> on the return value after possibly assigning 1724a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1725method on the TIE object. [/MAYCHANGE] 1726 1727[MAYCHANGE] 1728In other words, the array or hash fetch/store functions don't really 1729fetch and store actual values in the case of tied arrays and hashes. They 1730merely call C<mg_copy> to attach magic to the values that were meant to be 1731"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1732do the job of invoking the TIE methods on the underlying objects. Thus 1733the magic mechanism currently implements a kind of lazy access to arrays 1734and hashes. 1735 1736Currently (as of perl version 5.004), use of the hash and array access 1737functions requires the user to be aware of whether they are operating on 1738"normal" hashes and arrays, or on their tied variants. The API may be 1739changed to provide more transparent access to both tied and normal data 1740types in future versions. 1741[/MAYCHANGE] 1742 1743You would do well to understand that the TIEARRAY and TIEHASH interfaces 1744are mere sugar to invoke some perl method calls while using the uniform hash 1745and array syntax. The use of this sugar imposes some overhead (typically 1746about two to four extra opcodes per FETCH/STORE operation, in addition to 1747the creation of all the mortal variables required to invoke the methods). 1748This overhead will be comparatively small if the TIE methods are themselves 1749substantial, but if they are only a few statements long, the overhead 1750will not be insignificant. 1751 1752=head2 Localizing changes 1753 1754Perl has a very handy construction 1755 1756 { 1757 local $var = 2; 1758 ... 1759 } 1760 1761This construction is I<approximately> equivalent to 1762 1763 { 1764 my $oldvar = $var; 1765 $var = 2; 1766 ... 1767 $var = $oldvar; 1768 } 1769 1770The biggest difference is that the first construction would 1771reinstate the initial value of $var, irrespective of how control exits 1772the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1773more efficient as well. 1774 1775There is a way to achieve a similar task from C via Perl API: create a 1776I<pseudo-block>, and arrange for some changes to be automatically 1777undone at the end of it, either explicit, or via a non-local exit (via 1778die()). A I<block>-like construct is created by a pair of 1779C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1780Such a construct may be created specially for some important localized 1781task, or an existing one (like boundaries of enclosing Perl 1782subroutine/block, or an existing pair for freeing TMPs) may be 1783used. (In the second case the overhead of additional localization must 1784be almost negligible.) Note that any XSUB is automatically enclosed in 1785an C<ENTER>/C<LEAVE> pair. 1786 1787Inside such a I<pseudo-block> the following service is available: 1788 1789=over 4 1790 1791=item C<SAVEINT(int i)> 1792 1793=item C<SAVEIV(IV i)> 1794 1795=item C<SAVEI32(I32 i)> 1796 1797=item C<SAVELONG(long i)> 1798 1799=item C<SAVEI8(I8 i)> 1800 1801=item C<SAVEI16(I16 i)> 1802 1803=item C<SAVEBOOL(int i)> 1804 1805=item C<SAVESTRLEN(STRLEN i)> 1806 1807These macros arrange things to restore the value of integer variable 1808C<i> at the end of the enclosing I<pseudo-block>. 1809 1810=for apidoc_section $callback 1811=for apidoc Amh||SAVEINT|int i 1812=for apidoc Amh||SAVEIV|IV i 1813=for apidoc Amh||SAVEI32|I32 i 1814=for apidoc Amh||SAVELONG|long i 1815=for apidoc Amh||SAVEI8|I8 i 1816=for apidoc Amh||SAVEI16|I16 i 1817=for apidoc Amh||SAVEBOOL|bool i 1818=for apidoc Amh||SAVESTRLEN|STRLEN i 1819 1820=item C<SAVESPTR(s)> 1821 1822=item C<SAVEPPTR(p)> 1823 1824These macros arrange things to restore the value of pointers C<s> and 1825C<p>. C<s> must be a pointer of a type which survives conversion to 1826C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1827and back. 1828 1829=for apidoc Amh||SAVESPTR|SV * s 1830=for apidoc Amh||SAVEPPTR|char * p 1831 1832=item C<SAVEFREESV(SV *sv)> 1833 1834The refcount of C<sv> will be decremented at the end of 1835I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1836mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1837extends the lifetime of C<sv> until the beginning of the next statement, 1838C<SAVEFREESV> extends it until the end of the enclosing scope. These 1839lifetimes can be wildly different. 1840 1841Also compare C<SAVEMORTALIZESV>. 1842 1843=for apidoc Amh||SAVEFREESV|SV* sv 1844 1845=item C<SAVEMORTALIZESV(SV *sv)> 1846 1847Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1848scope instead of decrementing its reference count. This usually has the 1849effect of keeping C<sv> alive until the statement that called the currently 1850live scope has finished executing. 1851 1852=for apidoc Amh||SAVEMORTALIZESV|SV* sv 1853 1854=item C<SAVEFREEOP(OP *op)> 1855 1856The C<OP *> is op_free()ed at the end of I<pseudo-block>. 1857 1858=for apidoc Amh||SAVEFREEOP|OP *op 1859 1860=item C<SAVEFREEPV(p)> 1861 1862The chunk of memory which is pointed to by C<p> is Safefree()ed at the 1863end of I<pseudo-block>. 1864 1865=for apidoc Amh||SAVEFREEPV|void * p 1866 1867=item C<SAVECLEARSV(SV *sv)> 1868 1869Clears a slot in the current scratchpad which corresponds to C<sv> at 1870the end of I<pseudo-block>. 1871 1872=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1873 1874The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1875string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1876short-lived storage, the corresponding string may be reallocated like 1877this: 1878 1879 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1880 1881=for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length 1882 1883=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1884 1885At the end of I<pseudo-block> the function C<f> is called with the 1886only argument C<p>. 1887 1888=for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t 1889=for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p 1890 1891=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1892 1893At the end of I<pseudo-block> the function C<f> is called with the 1894implicit context argument (if any), and C<p>. 1895 1896=for apidoc Ayh||DESTRUCTORFUNC_t 1897=for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p 1898 1899=item C<SAVESTACK_POS()> 1900 1901The current offset on the Perl internal stack (cf. C<SP>) is restored 1902at the end of I<pseudo-block>. 1903 1904=for apidoc Amh||SAVESTACK_POS 1905 1906=back 1907 1908The following API list contains functions, thus one needs to 1909provide pointers to the modifiable data explicitly (either C pointers, 1910or Perlish C<GV *>s). Where the above macros take C<int>, a similar 1911function takes C<int *>. 1912 1913Other macros above have functions implementing them, but its probably 1914best to just use the macro, and not those or the ones below. 1915 1916=over 4 1917 1918=item C<SV* save_scalar(GV *gv)> 1919 1920=for apidoc save_scalar 1921 1922Equivalent to Perl code C<local $gv>. 1923 1924=item C<AV* save_ary(GV *gv)> 1925 1926=for apidoc save_ary 1927 1928=item C<HV* save_hash(GV *gv)> 1929 1930=for apidoc save_hash 1931 1932Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 1933 1934=item C<void save_item(SV *item)> 1935 1936=for apidoc save_item 1937 1938Duplicates the current value of C<SV>. On the exit from the current 1939C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored 1940using the stored value. It doesn't handle magic. Use C<save_scalar> if 1941magic is affected. 1942 1943=item C<void save_list(SV **sarg, I32 maxsarg)> 1944 1945=for apidoc save_list 1946 1947A variant of C<save_item> which takes multiple arguments via an array 1948C<sarg> of C<SV*> of length C<maxsarg>. 1949 1950=item C<SV* save_svref(SV **sptr)> 1951 1952=for apidoc save_svref 1953 1954Similar to C<save_scalar>, but will reinstate an C<SV *>. 1955 1956=item C<void save_aptr(AV **aptr)> 1957 1958=item C<void save_hptr(HV **hptr)> 1959 1960=for apidoc save_aptr 1961=for apidoc save_hptr 1962 1963Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 1964 1965=back 1966 1967The C<Alias> module implements localization of the basic types within the 1968I<caller's scope>. People who are interested in how to localize things in 1969the containing scope should take a look there too. 1970 1971=head1 Subroutines 1972 1973=head2 XSUBs and the Argument Stack 1974 1975The XSUB mechanism is a simple way for Perl programs to access C subroutines. 1976An XSUB routine will have a stack that contains the arguments from the Perl 1977program, and a way to map from the Perl data structures to a C equivalent. 1978 1979The stack arguments are accessible through the C<ST(n)> macro, which returns 1980the C<n>'th stack argument. Argument 0 is the first argument passed in the 1981Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 1982an C<SV*> is used. 1983 1984Most of the time, output from the C routine can be handled through use of 1985the RETVAL and OUTPUT directives. However, there are some cases where the 1986argument stack is not already long enough to handle all the return values. 1987An example is the POSIX tzname() call, which takes no arguments, but returns 1988two, the local time zone's standard and summer time abbreviations. 1989 1990To handle this situation, the PPCODE directive is used and the stack is 1991extended using the macro: 1992 1993 EXTEND(SP, num); 1994 1995where C<SP> is the macro that represents the local copy of the stack pointer, 1996and C<num> is the number of elements the stack should be extended by. 1997 1998Now that there is room on the stack, values can be pushed on it using C<PUSHs> 1999macro. The pushed values will often need to be "mortal" (See 2000L</Reference Counts and Mortality>): 2001 2002 PUSHs(sv_2mortal(newSViv(an_integer))) 2003 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) 2004 PUSHs(sv_2mortal(newSVnv(a_double))) 2005 PUSHs(sv_2mortal(newSVpv("Some String",0))) 2006 /* Although the last example is better written as the more 2007 * efficient: */ 2008 PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) 2009 2010And now the Perl program calling C<tzname>, the two values will be assigned 2011as in: 2012 2013 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 2014 2015An alternate (and possibly simpler) method to pushing values on the stack is 2016to use the macro: 2017 2018 XPUSHs(SV*) 2019 2020This macro automatically adjusts the stack for you, if needed. Thus, you 2021do not need to call C<EXTEND> to extend the stack. 2022 2023Despite their suggestions in earlier versions of this document the macros 2024C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. 2025For that, either stick to the C<(X)PUSHs> macros shown above, or use the new 2026C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. 2027 2028For more information, consult L<perlxs> and L<perlxstut>. 2029 2030=head2 Autoloading with XSUBs 2031 2032If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the 2033fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable 2034of the XSUB's package. 2035 2036But it also puts the same information in certain fields of the XSUB itself: 2037 2038 HV *stash = CvSTASH(cv); 2039 const char *subname = SvPVX(cv); 2040 STRLEN name_length = SvCUR(cv); /* in bytes */ 2041 U32 is_utf8 = SvUTF8(cv); 2042 2043C<SvPVX(cv)> contains just the sub name itself, not including the package. 2044For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, 2045C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. 2046 2047B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support 2048XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the 2049XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need 2050to support 5.8-5.14, use the XSUB's fields. 2051 2052=head2 Calling Perl Routines from within C Programs 2053 2054There are four routines that can be used to call a Perl subroutine from 2055within a C program. These four are: 2056 2057 I32 call_sv(SV*, I32); 2058 I32 call_pv(const char*, I32); 2059 I32 call_method(const char*, I32); 2060 I32 call_argv(const char*, I32, char**); 2061 2062The routine most often used is C<call_sv>. The C<SV*> argument 2063contains either the name of the Perl subroutine to be called, or a 2064reference to the subroutine. The second argument consists of flags 2065that control the context in which the subroutine is called, whether 2066or not the subroutine is being passed arguments, how errors should be 2067trapped, and how to treat return values. 2068 2069All four routines return the number of arguments that the subroutine returned 2070on the Perl stack. 2071 2072These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 2073but those names are now deprecated; macros of the same name are provided for 2074compatibility. 2075 2076When using any of these routines (except C<call_argv>), the programmer 2077must manipulate the Perl stack. These include the following macros and 2078functions: 2079 2080 dSP 2081 SP 2082 PUSHMARK() 2083 PUTBACK 2084 SPAGAIN 2085 ENTER 2086 SAVETMPS 2087 FREETMPS 2088 LEAVE 2089 XPUSH*() 2090 POP*() 2091 2092For a detailed description of calling conventions from C to Perl, 2093consult L<perlcall>. 2094 2095=head2 Putting a C value on Perl stack 2096 2097A lot of opcodes (this is an elementary operation in the internal perl 2098stack machine) put an SV* on the stack. However, as an optimization 2099the corresponding SV is (usually) not recreated each time. The opcodes 2100reuse specially assigned SVs (I<target>s) which are (as a corollary) 2101not constantly freed/created. 2102 2103Each of the targets is created only once (but see 2104L</Scratchpads and recursion> below), and when an opcode needs to put 2105an integer, a double, or a string on stack, it just sets the 2106corresponding parts of its I<target> and puts the I<target> on stack. 2107 2108The macro to put this target on stack is C<PUSHTARG>, and it is 2109directly used in some opcodes, as well as indirectly in zillions of 2110others, which use it via C<(X)PUSH[iunp]>. 2111 2112Because the target is reused, you must be careful when pushing multiple 2113values on the stack. The following code will not do what you think: 2114 2115 XPUSHi(10); 2116 XPUSHi(20); 2117 2118This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 2119the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 2120At the end of the operation, the stack does not contain the values 10 2121and 20, but actually contains two pointers to C<TARG>, which we have set 2122to 20. 2123 2124If you need to push multiple different values then you should either use 2125the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, 2126none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an 2127SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, 2128will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make 2129this a little easier to achieve by creating a new mortal for you (via 2130C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary 2131in the case of the C<mXPUSH[iunp]> macros), and then setting its value. 2132Thus, instead of writing this to "fix" the example above: 2133 2134 XPUSHs(sv_2mortal(newSViv(10))) 2135 XPUSHs(sv_2mortal(newSViv(20))) 2136 2137you can simply write: 2138 2139 mXPUSHi(10) 2140 mXPUSHi(20) 2141 2142On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to 2143need a C<dTARG> in your variable declarations so that the C<*PUSH*> 2144macros can make use of the local variable C<TARG>. See also 2145C<dTARGET> and C<dXSTARG>. 2146 2147=head2 Scratchpads 2148 2149The question remains on when the SVs which are I<target>s for opcodes 2150are created. The answer is that they are created when the current 2151unit--a subroutine or a file (for opcodes for statements outside of 2152subroutines)--is compiled. During this time a special anonymous Perl 2153array is created, which is called a scratchpad for the current unit. 2154 2155A scratchpad keeps SVs which are lexicals for the current unit and are 2156targets for opcodes. A previous version of this document 2157stated that one can deduce that an SV lives on a scratchpad 2158by looking on its flags: lexicals have C<SVs_PADMY> set, and 2159I<target>s have C<SVs_PADTMP> set. But this has never been fully true. 2160C<SVs_PADMY> could be set on a variable that no longer resides in any pad. 2161While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables 2162that have never resided in a pad, but nonetheless act like I<target>s. As 2163of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as 21640. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. 2165 2166=for apidoc_section $pad 2167=for apidoc Amnh||SVs_PADTMP 2168=for apidoc AmnhD||SVs_PADMY 2169 2170The correspondence between OPs and I<target>s is not 1-to-1. Different 2171OPs in the compile tree of the unit can use the same target, if this 2172would not conflict with the expected life of the temporary. 2173 2174=head2 Scratchpads and recursion 2175 2176In fact it is not 100% true that a compiled unit contains a pointer to 2177the scratchpad AV. In fact it contains a pointer to an AV of 2178(initially) one element, and this element is the scratchpad AV. Why do 2179we need an extra level of indirection? 2180 2181The answer is B<recursion>, and maybe B<threads>. Both 2182these can create several execution pointers going into the same 2183subroutine. For the subroutine-child not write over the temporaries 2184for the subroutine-parent (lifespan of which covers the call to the 2185child), the parent and the child should have different 2186scratchpads. (I<And> the lexicals should be separate anyway!) 2187 2188So each subroutine is born with an array of scratchpads (of length 1). 2189On each entry to the subroutine it is checked that the current 2190depth of the recursion is not more than the length of this array, and 2191if it is, new scratchpad is created and pushed into the array. 2192 2193The I<target>s on this scratchpad are C<undef>s, but they are already 2194marked with correct flags. 2195 2196=head1 Memory Allocation 2197 2198=head2 Allocation 2199 2200All memory meant to be used with the Perl API functions should be manipulated 2201using the macros described in this section. The macros provide the necessary 2202transparency between differences in the actual malloc implementation that is 2203used within perl. 2204 2205The following three macros are used to initially allocate memory : 2206 2207 Newx(pointer, number, type); 2208 Newxc(pointer, number, type, cast); 2209 Newxz(pointer, number, type); 2210 2211The first argument C<pointer> should be the name of a variable that will 2212point to the newly allocated memory. 2213 2214The second and third arguments C<number> and C<type> specify how many of 2215the specified type of data structure should be allocated. The argument 2216C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, 2217should be used if the C<pointer> argument is different from the C<type> 2218argument. 2219 2220Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> 2221to zero out all the newly allocated memory. 2222 2223=head2 Reallocation 2224 2225 Renew(pointer, number, type); 2226 Renewc(pointer, number, type, cast); 2227 Safefree(pointer) 2228 2229These three macros are used to change a memory buffer size or to free a 2230piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 2231match those of C<New> and C<Newc> with the exception of not needing the 2232"magic cookie" argument. 2233 2234=head2 Moving 2235 2236 Move(source, dest, number, type); 2237 Copy(source, dest, number, type); 2238 Zero(dest, number, type); 2239 2240These three macros are used to move, copy, or zero out previously allocated 2241memory. The C<source> and C<dest> arguments point to the source and 2242destination starting points. Perl will move, copy, or zero out C<number> 2243instances of the size of the C<type> data structure (using the C<sizeof> 2244function). 2245 2246=head1 PerlIO 2247 2248The most recent development releases of Perl have been experimenting with 2249removing Perl's dependency on the "normal" standard I/O suite and allowing 2250other stdio implementations to be used. This involves creating a new 2251abstraction layer that then calls whichever implementation of stdio Perl 2252was compiled with. All XSUBs should now use the functions in the PerlIO 2253abstraction layer and not make any assumptions about what kind of stdio 2254is being used. 2255 2256For a complete description of the PerlIO abstraction, consult L<perlapio>. 2257 2258=head1 Compiled code 2259 2260=head2 Code tree 2261 2262Here we describe the internal form your code is converted to by 2263Perl. Start with a simple example: 2264 2265 $a = $b + $c; 2266 2267This is converted to a tree similar to this one: 2268 2269 assign-to 2270 / \ 2271 + $a 2272 / \ 2273 $b $c 2274 2275(but slightly more complicated). This tree reflects the way Perl 2276parsed your code, but has nothing to do with the execution order. 2277There is an additional "thread" going through the nodes of the tree 2278which shows the order of execution of the nodes. In our simplified 2279example above it looks like: 2280 2281 $b ---> $c ---> + ---> $a ---> assign-to 2282 2283But with the actual compile tree for C<$a = $b + $c> it is different: 2284some nodes I<optimized away>. As a corollary, though the actual tree 2285contains more nodes than our simplified example, the execution order 2286is the same as in our example. 2287 2288=head2 Examining the tree 2289 2290If you have your perl compiled for debugging (usually done with 2291C<-DDEBUGGING> on the C<Configure> command line), you may examine the 2292compiled tree by specifying C<-Dx> on the Perl command line. The 2293output takes several lines per node, and for C<$b+$c> it looks like 2294this: 2295 2296 5 TYPE = add ===> 6 2297 TARG = 1 2298 FLAGS = (SCALAR,KIDS) 2299 { 2300 TYPE = null ===> (4) 2301 (was rv2sv) 2302 FLAGS = (SCALAR,KIDS) 2303 { 2304 3 TYPE = gvsv ===> 4 2305 FLAGS = (SCALAR) 2306 GV = main::b 2307 } 2308 } 2309 { 2310 TYPE = null ===> (5) 2311 (was rv2sv) 2312 FLAGS = (SCALAR,KIDS) 2313 { 2314 4 TYPE = gvsv ===> 5 2315 FLAGS = (SCALAR) 2316 GV = main::c 2317 } 2318 } 2319 2320This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 2321not optimized away (one per number in the left column). The immediate 2322children of the given node correspond to C<{}> pairs on the same level 2323of indentation, thus this listing corresponds to the tree: 2324 2325 add 2326 / \ 2327 null null 2328 | | 2329 gvsv gvsv 2330 2331The execution order is indicated by C<===E<gt>> marks, thus it is C<3 23324 5 6> (node C<6> is not included into above listing), i.e., 2333C<gvsv gvsv add whatever>. 2334 2335Each of these nodes represents an op, a fundamental operation inside the 2336Perl core. The code which implements each operation can be found in the 2337F<pp*.c> files; the function which implements the op with type C<gvsv> 2338is C<pp_gvsv>, and so on. As the tree above shows, different ops have 2339different numbers of children: C<add> is a binary operator, as one would 2340expect, and so has two children. To accommodate the various different 2341numbers of children, there are various types of op data structure, and 2342they link together in different ways. 2343 2344The simplest type of op structure is C<OP>: this has no children. Unary 2345operators, C<UNOP>s, have one child, and this is pointed to by the 2346C<op_first> field. Binary operators (C<BINOP>s) have not only an 2347C<op_first> field but also an C<op_last> field. The most complex type of 2348op is a C<LISTOP>, which has any number of children. In this case, the 2349first child is pointed to by C<op_first> and the last child by 2350C<op_last>. The children in between can be found by iteratively 2351following the C<OpSIBLING> pointer from the first child to the last (but 2352see below). 2353 2354=for apidoc_section $optree_construction 2355=for apidoc Ayh||OP 2356=for apidoc Ayh||BINOP 2357=for apidoc Ayh||LISTOP 2358=for apidoc Ayh||UNOP 2359 2360There are also some other op types: a C<PMOP> holds a regular expression, 2361and has no children, and a C<LOOP> may or may not have children. If the 2362C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 2363complicate matters, if a C<UNOP> is actually a C<null> op after 2364optimization (see L</Compile pass 2: context propagation>) it will still 2365have children in accordance with its former type. 2366 2367=for apidoc Ayh||LOOP 2368=for apidoc Ayh||PMOP 2369 2370Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one 2371or more children, but it doesn't have an C<op_last> field: so you have to 2372follow C<op_first> and then the C<OpSIBLING> chain itself to find the 2373last child. Instead it has an C<op_other> field, which is comparable to 2374the C<op_next> field described below, and represents an alternate 2375execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note 2376that in general, C<op_other> may not point to any of the direct children 2377of the C<LOGOP>. 2378 2379=for apidoc Ayh||LOGOP 2380 2381Starting in version 5.21.2, perls built with the experimental 2382define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, 2383C<op_moresib>. When not set, this indicates that this is the last op in an 2384C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last 2385sibling to point back to the parent op. Under this build, that field is 2386also renamed C<op_sibparent> to reflect its joint role. The macro 2387C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on 2388the last sibling. With this build the C<op_parent(o)> function can be 2389used to find the parent of any op. Thus for forward compatibility, you 2390should always use the C<OpSIBLING(o)> macro rather than accessing 2391C<op_sibling> directly. 2392 2393Another way to examine the tree is to use a compiler back-end module, such 2394as L<B::Concise>. 2395 2396=head2 Compile pass 1: check routines 2397 2398The tree is created by the compiler while I<yacc> code feeds it 2399the constructions it recognizes. Since I<yacc> works bottom-up, so does 2400the first pass of perl compilation. 2401 2402What makes this pass interesting for perl developers is that some 2403optimization may be performed on this pass. This is optimization by 2404so-called "check routines". The correspondence between node names 2405and corresponding check routines is described in F<opcode.pl> (do not 2406forget to run C<make regen_headers> if you modify this file). 2407 2408A check routine is called when the node is fully constructed except 2409for the execution-order thread. Since at this time there are no 2410back-links to the currently constructed node, one can do most any 2411operation to the top-level node, including freeing it and/or creating 2412new nodes above/below it. 2413 2414The check routine returns the node which should be inserted into the 2415tree (if the top-level node was not modified, check routine returns 2416its argument). 2417 2418By convention, check routines have names C<ck_*>. They are usually 2419called from C<new*OP> subroutines (or C<convert>) (which in turn are 2420called from F<perly.y>). 2421 2422=head2 Compile pass 1a: constant folding 2423 2424Immediately after the check routine is called the returned node is 2425checked for being compile-time executable. If it is (the value is 2426judged to be constant) it is immediately executed, and a I<constant> 2427node with the "return value" of the corresponding subtree is 2428substituted instead. The subtree is deleted. 2429 2430If constant folding was not performed, the execution-order thread is 2431created. 2432 2433=head2 Compile pass 2: context propagation 2434 2435When a context for a part of compile tree is known, it is propagated 2436down through the tree. At this time the context can have 5 values 2437(instead of 2 for runtime context): void, boolean, scalar, list, and 2438lvalue. In contrast with the pass 1 this pass is processed from top 2439to bottom: a node's context determines the context for its children. 2440 2441Additional context-dependent optimizations are performed at this time. 2442Since at this moment the compile tree contains back-references (via 2443"thread" pointers), nodes cannot be free()d now. To allow 2444optimized-away nodes at this stage, such nodes are null()ified instead 2445of free()ing (i.e. their type is changed to OP_NULL). 2446 2447=head2 Compile pass 3: peephole optimization 2448 2449After the compile tree for a subroutine (or for an C<eval> or a file) 2450is created, an additional pass over the code is performed. This pass 2451is neither top-down or bottom-up, but in the execution order (with 2452additional complications for conditionals). Optimizations performed 2453at this stage are subject to the same restrictions as in the pass 2. 2454 2455Peephole optimizations are done by calling the function pointed to 2456by the global variable C<PL_peepp>. By default, C<PL_peepp> just 2457calls the function pointed to by the global variable C<PL_rpeepp>. 2458By default, that performs some basic op fixups and optimisations along 2459the execution-order op chain, and recursively calls C<PL_rpeepp> for 2460each side chain of ops (resulting from conditionals). Extensions may 2461provide additional optimisations or fixups, hooking into either the 2462per-subroutine or recursive stage, like this: 2463 2464 static peep_t prev_peepp; 2465 static void my_peep(pTHX_ OP *o) 2466 { 2467 /* custom per-subroutine optimisation goes here */ 2468 prev_peepp(aTHX_ o); 2469 /* custom per-subroutine optimisation may also go here */ 2470 } 2471 BOOT: 2472 prev_peepp = PL_peepp; 2473 PL_peepp = my_peep; 2474 2475 static peep_t prev_rpeepp; 2476 static void my_rpeep(pTHX_ OP *first) 2477 { 2478 OP *o = first, *t = first; 2479 for(; o = o->op_next, t = t->op_next) { 2480 /* custom per-op optimisation goes here */ 2481 o = o->op_next; 2482 if (!o || o == t) break; 2483 /* custom per-op optimisation goes AND here */ 2484 } 2485 prev_rpeepp(aTHX_ orig_o); 2486 } 2487 BOOT: 2488 prev_rpeepp = PL_rpeepp; 2489 PL_rpeepp = my_rpeep; 2490 2491=for apidoc_section $optree_manipulation 2492=for apidoc Ayh||peep_t 2493 2494=head2 Pluggable runops 2495 2496The compile tree is executed in a runops function. There are two runops 2497functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 2498with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 2499control over the execution of the compile tree it is possible to provide 2500your own runops function. 2501 2502It's probably best to copy one of the existing runops functions and 2503change it to suit your needs. Then, in the BOOT section of your XS 2504file, add the line: 2505 2506 PL_runops = my_runops; 2507 2508=for apidoc_section $debugging 2509=for apidoc runops_debug 2510=for apidoc runops_standard 2511=for apidoc Amnh|runops_proc_t|PL_runops 2512 2513This function should be as efficient as possible to keep your programs 2514running as fast as possible. 2515 2516=head2 Compile-time scope hooks 2517 2518As of perl 5.14 it is possible to hook into the compile-time lexical 2519scope mechanism using C<Perl_blockhook_register>. This is used like 2520this: 2521 2522 STATIC void my_start_hook(pTHX_ int full); 2523 STATIC BHK my_hooks; 2524 2525 BOOT: 2526 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); 2527 Perl_blockhook_register(aTHX_ &my_hooks); 2528 2529This will arrange to have C<my_start_hook> called at the start of 2530compiling every lexical scope. The available hooks are: 2531 2532=for apidoc_section $lexer 2533=for apidoc Ayh||BHK 2534 2535=over 4 2536 2537=item C<void bhk_start(pTHX_ int full)> 2538 2539This is called just after starting a new lexical scope. Note that Perl 2540code like 2541 2542 if ($x) { ... } 2543 2544creates two scopes: the first starts at the C<(> and has C<full == 1>, 2545the second starts at the C<{> and has C<full == 0>. Both end at the 2546C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything 2547pushed onto the save stack by this hook will be popped just before the 2548scope ends (between the C<pre_> and C<post_end> hooks, in fact). 2549 2550=item C<void bhk_pre_end(pTHX_ OP **o)> 2551 2552This is called at the end of a lexical scope, just before unwinding the 2553stack. I<o> is the root of the optree representing the scope; it is a 2554double pointer so you can replace the OP if you need to. 2555 2556=item C<void bhk_post_end(pTHX_ OP **o)> 2557 2558This is called at the end of a lexical scope, just after unwinding the 2559stack. I<o> is as above. Note that it is possible for calls to C<pre_> 2560and C<post_end> to nest, if there is something on the save stack that 2561calls string eval. 2562 2563=item C<void bhk_eval(pTHX_ OP *const o)> 2564 2565This is called just before starting to compile an C<eval STRING>, C<do 2566FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the 2567OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, 2568C<OP_DOFILE> or C<OP_REQUIRE>. 2569 2570=back 2571 2572Once you have your hook functions, you need a C<BHK> structure to put 2573them in. It's best to allocate it statically, since there is no way to 2574free it once it's registered. The function pointers should be inserted 2575into this structure using the C<BhkENTRY_set> macro, which will also set 2576flags indicating which entries are valid. If you do need to allocate 2577your C<BHK> dynamically for some reason, be sure to zero it before you 2578start. 2579 2580Once registered, there is no mechanism to switch these hooks off, so if 2581that is necessary you will need to do this yourself. An entry in C<%^H> 2582is probably the best way, so the effect is lexically scoped; however it 2583is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to 2584temporarily switch entries on and off. You should also be aware that 2585generally speaking at least one scope will have opened before your 2586extension is loaded, so you will see some C<pre>/C<post_end> pairs that 2587didn't have a matching C<start>. 2588 2589=head1 Examining internal data structures with the C<dump> functions 2590 2591To aid debugging, the source file F<dump.c> contains a number of 2592functions which produce formatted output of internal data structures. 2593 2594The most commonly used of these functions is C<Perl_sv_dump>; it's used 2595for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 2596C<sv_dump> to produce debugging output from Perl-space, so users of that 2597module should already be familiar with its format. 2598 2599C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 2600derivatives, and produces output similar to C<perl -Dx>; in fact, 2601C<Perl_dump_eval> will dump the main root of the code being evaluated, 2602exactly like C<-Dx>. 2603 2604=for apidoc_section $debugging 2605=for apidoc dump_eval 2606 2607Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 2608op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 2609subroutines in a package like so: (Thankfully, these are all xsubs, so 2610there is no op tree) 2611 2612=for apidoc_section $debugging 2613=for apidoc dump_sub 2614 2615 (gdb) print Perl_dump_packsubs(PL_defstash) 2616 2617 SUB attributes::bootstrap = (xsub 0x811fedc 0) 2618 2619 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 2620 2621 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 2622 2623 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 2624 2625 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 2626 2627and C<Perl_dump_all>, which dumps all the subroutines in the stash and 2628the op tree of the main root. 2629 2630=head1 How multiple interpreters and concurrency are supported 2631 2632=head2 Background and MULTIPLICITY 2633 2634=for apidoc_section $concurrency 2635=for apidoc Amnh||PERL_IMPLICIT_CONTEXT 2636 2637The Perl interpreter can be regarded as a closed box: it has an API 2638for feeding it code or otherwise making it do things, but it also has 2639functions for its own use. This smells a lot like an object, and 2640there is a way for you to build Perl so that you can have multiple 2641interpreters, with one interpreter represented either as a C structure, 2642or inside a thread-specific structure. These structures contain all 2643the context, the state of that interpreter. 2644 2645The macro that controls the major Perl build flavor is MULTIPLICITY. The 2646MULTIPLICITY build has a C structure that packages all the interpreter 2647state, which is being passed to various perl functions as a "hidden" 2648first argument. MULTIPLICITY makes multi-threaded perls possible (with the 2649ithreads threading model, related to the macro USE_ITHREADS.) 2650 2651PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. 2652 2653=for apidoc_section $concurrency 2654=for apidoc Amnh||MULTIPLICITY 2655 2656To see whether you have non-const data you can use a BSD (or GNU) 2657compatible C<nm>: 2658 2659 nm libperl.a | grep -v ' [TURtr] ' 2660 2661If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), 2662you have non-const data. The symbols the C<grep> removed are as follows: 2663C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, 2664and the C<U> is <undefined>, external symbols referred to. 2665 2666The test F<t/porting/libperl.t> does this kind of symbol sanity 2667checking on C<libperl.a>. 2668 2669All this obviously requires a way for the Perl internal functions to be 2670either subroutines taking some kind of structure as the first 2671argument, or subroutines taking nothing as the first argument. To 2672enable these two very different ways of building the interpreter, 2673the Perl source (as it does in so many other situations) makes heavy 2674use of macros and subroutine naming conventions. 2675 2676First problem: deciding which functions will be public API functions and 2677which will be private. All functions whose names begin C<S_> are private 2678(think "S" for "secret" or "static"). All other functions begin with 2679"Perl_", but just because a function begins with "Perl_" does not mean it is 2680part of the API. (See L</Internal 2681Functions>.) The easiest way to be B<sure> a 2682function is part of the API is to find its entry in L<perlapi>. 2683If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 2684think it should be (i.e., you need it for your extension), submit an issue at 2685L<https://github.com/Perl/perl5/issues> explaining why you think it should be. 2686 2687Second problem: there must be a syntax so that the same subroutine 2688declarations and calls can pass a structure as their first argument, 2689or pass nothing. To solve this, the subroutines are named and 2690declared in a particular way. Here's a typical start of a static 2691function used within the Perl guts: 2692 2693 STATIC void 2694 S_incline(pTHX_ char *s) 2695 2696STATIC becomes "static" in C, and may be #define'd to nothing in some 2697configurations in the future. 2698 2699=for apidoc_section $directives 2700=for apidoc Ayh||STATIC 2701 2702A public function (i.e. part of the internal API, but not necessarily 2703sanctioned for use in extensions) begins like this: 2704 2705 void 2706 Perl_sv_setiv(pTHX_ SV* dsv, IV num) 2707 2708C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the 2709details of the interpreter's context. THX stands for "thread", "this", 2710or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 2711The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 2712or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 2713their variants. 2714 2715=for apidoc_section $concurrency 2716=for apidoc Amnh||aTHX 2717=for apidoc Amnh||aTHX_ 2718=for apidoc Amnh||dTHX 2719=for apidoc Amnh||pTHX 2720=for apidoc Amnh||pTHX_ 2721 2722When Perl is built without options that set MULTIPLICITY, there is no 2723first argument containing the interpreter's context. The trailing underscore 2724in the pTHX_ macro indicates that the macro expansion needs a comma 2725after the context argument because other arguments follow it. If 2726MULTIPLICITY is not defined, pTHX_ will be ignored, and the 2727subroutine is not prototyped to take the extra argument. The form of the 2728macro without the trailing underscore is used when there are no additional 2729explicit arguments. 2730 2731When a core function calls another, it must pass the context. This 2732is normally hidden via macros. Consider C<sv_setiv>. It expands into 2733something like this: 2734 2735 #ifdef MULTIPLICITY 2736 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 2737 /* can't do this for vararg functions, see below */ 2738 #else 2739 #define sv_setiv Perl_sv_setiv 2740 #endif 2741 2742This works well, and means that XS authors can gleefully write: 2743 2744 sv_setiv(foo, bar); 2745 2746and still have it work under all the modes Perl could have been 2747compiled with. 2748 2749This doesn't work so cleanly for varargs functions, though, as macros 2750imply that the number of arguments is known in advance. Instead we 2751either need to spell them out fully, passing C<aTHX_> as the first 2752argument (the Perl core tends to do this with functions like 2753Perl_warner), or use a context-free version. 2754 2755The context-free version of Perl_warner is called 2756Perl_warner_nocontext, and does not take the extra argument. Instead 2757it does C<dTHX;> to get the context from thread-local storage. We 2758C<#define warner Perl_warner_nocontext> so that extensions get source 2759compatibility at the expense of performance. (Passing an arg is 2760cheaper than grabbing it from thread-local storage.) 2761 2762You can ignore [pad]THXx when browsing the Perl headers/sources. 2763Those are strictly for use within the core. Extensions and embedders 2764need only be aware of [pad]THX. 2765 2766=head2 So what happened to dTHR? 2767 2768=for apidoc_section $concurrency 2769=for apidoc Amnh||dTHR 2770 2771C<dTHR> was introduced in perl 5.005 to support the older thread model. 2772The older thread model now uses the C<THX> mechanism to pass context 2773pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 2774later still have it for backward source compatibility, but it is defined 2775to be a no-op. 2776 2777=head2 How do I use all this in extensions? 2778 2779When Perl is built with MULTIPLICITY, extensions that call 2780any functions in the Perl API will need to pass the initial context 2781argument somehow. The kicker is that you will need to write it in 2782such a way that the extension still compiles when Perl hasn't been 2783built with MULTIPLICITY enabled. 2784 2785There are three ways to do this. First, the easy but inefficient way, 2786which is also the default, in order to maintain source compatibility 2787with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX 2788and aTHX_ macros to call a function that will return the context. 2789Thus, something like: 2790 2791 sv_setiv(sv, num); 2792 2793in your extension will translate to this when MULTIPLICITY is 2794in effect: 2795 2796 Perl_sv_setiv(Perl_get_context(), sv, num); 2797 2798or to this otherwise: 2799 2800 Perl_sv_setiv(sv, num); 2801 2802You don't have to do anything new in your extension to get this; since 2803the Perl library provides Perl_get_context(), it will all just 2804work. 2805 2806The second, more efficient way is to use the following template for 2807your Foo.xs: 2808 2809 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2810 #include "EXTERN.h" 2811 #include "perl.h" 2812 #include "XSUB.h" 2813 2814 STATIC void my_private_function(int arg1, int arg2); 2815 2816 STATIC void 2817 my_private_function(int arg1, int arg2) 2818 { 2819 dTHX; /* fetch context */ 2820 ... call many Perl API functions ... 2821 } 2822 2823 [... etc ...] 2824 2825 MODULE = Foo PACKAGE = Foo 2826 2827 /* typical XSUB */ 2828 2829 void 2830 my_xsub(arg) 2831 int arg 2832 CODE: 2833 my_private_function(arg, 10); 2834 2835Note that the only two changes from the normal way of writing an 2836extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 2837including the Perl headers, followed by a C<dTHX;> declaration at 2838the start of every function that will call the Perl API. (You'll 2839know which functions need this, because the C compiler will complain 2840that there's an undeclared identifier in those functions.) No changes 2841are needed for the XSUBs themselves, because the XS() macro is 2842correctly defined to pass in the implicit context if needed. 2843 2844The third, even more efficient way is to ape how it is done within 2845the Perl guts: 2846 2847 2848 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2849 #include "EXTERN.h" 2850 #include "perl.h" 2851 #include "XSUB.h" 2852 2853 /* pTHX_ only needed for functions that call Perl API */ 2854 STATIC void my_private_function(pTHX_ int arg1, int arg2); 2855 2856 STATIC void 2857 my_private_function(pTHX_ int arg1, int arg2) 2858 { 2859 /* dTHX; not needed here, because THX is an argument */ 2860 ... call Perl API functions ... 2861 } 2862 2863 [... etc ...] 2864 2865 MODULE = Foo PACKAGE = Foo 2866 2867 /* typical XSUB */ 2868 2869 void 2870 my_xsub(arg) 2871 int arg 2872 CODE: 2873 my_private_function(aTHX_ arg, 10); 2874 2875This implementation never has to fetch the context using a function 2876call, since it is always passed as an extra argument. Depending on 2877your needs for simplicity or efficiency, you may mix the previous 2878two approaches freely. 2879 2880Never add a comma after C<pTHX> yourself--always use the form of the 2881macro with the underscore for functions that take explicit arguments, 2882or the form without the argument for functions with no explicit arguments. 2883 2884=head2 Should I do anything special if I call perl from multiple threads? 2885 2886If you create interpreters in one thread and then proceed to call them in 2887another, you need to make sure perl's own Thread Local Storage (TLS) slot is 2888initialized correctly in each of those threads. 2889 2890The C<perl_alloc> and C<perl_clone> API functions will automatically set 2891the TLS slot to the interpreter they created, so that there is no need to do 2892anything special if the interpreter is always accessed in the same thread that 2893created it, and that thread did not create or call any other interpreters 2894afterwards. If that is not the case, you have to set the TLS slot of the 2895thread before calling any functions in the Perl API on that particular 2896interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 2897thread as the first thing you do: 2898 2899 /* do this before doing anything else with some_perl */ 2900 PERL_SET_CONTEXT(some_perl); 2901 2902 ... other Perl API calls on some_perl go here ... 2903 2904=for apidoc_section $embedding 2905=for apidoc Amh|void|PERL_SET_CONTEXT|PerlInterpreter* i 2906 2907(You can always get the current context via C<PERL_GET_CONTEXT>.) 2908 2909=for apidoc Amnh|PerlInterpreter*|PERL_GET_CONTEXT| 2910 2911=head2 Future Plans and PERL_IMPLICIT_SYS 2912 2913Just as MULTIPLICITY provides a way to bundle up everything 2914that the interpreter knows about itself and pass it around, so too are 2915there plans to allow the interpreter to bundle up everything it knows 2916about the environment it's running on. This is enabled with the 2917PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on 2918Windows. 2919 2920This allows the ability to provide an extra pointer (called the "host" 2921environment) for all the system calls. This makes it possible for 2922all the system stuff to maintain their own state, broken down into 2923seven C structures. These are thin wrappers around the usual system 2924calls (see F<win32/perllib.c>) for the default perl executable, but for a 2925more ambitious host (like the one that would do fork() emulation) all 2926the extra work needed to pretend that different interpreters are 2927actually different "processes", would be done here. 2928 2929The Perl engine/interpreter and the host are orthogonal entities. 2930There could be one or more interpreters in a process, and one or 2931more "hosts", with free association between them. 2932 2933=head1 Internal Functions 2934 2935All of Perl's internal functions which will be exposed to the outside 2936world are prefixed by C<Perl_> so that they will not conflict with XS 2937functions or functions used in a program in which Perl is embedded. 2938Similarly, all global variables begin with C<PL_>. (By convention, 2939static functions start with C<S_>.) 2940 2941Inside the Perl core (C<PERL_CORE> defined), you can get at the functions 2942either with or without the C<Perl_> prefix, thanks to a bunch of defines 2943that live in F<embed.h>. Note that extension code should I<not> set 2944C<PERL_CORE>; this exposes the full perl internals, and is likely to cause 2945breakage of the XS in each new perl release. 2946 2947The file F<embed.h> is generated automatically from 2948F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 2949header files for the internal functions, generates the documentation 2950and a lot of other bits and pieces. It's important that when you add 2951a new function to the core or change an existing one, you change the 2952data in the table in F<embed.fnc> as well. Here's a sample entry from 2953that table: 2954 2955 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 2956 2957The first column is a set of flags, the second column the return type, 2958the third column the name. Columns after that are the arguments. 2959The flags are documented at the top of F<embed.fnc>. 2960 2961If you edit F<embed.pl> or F<embed.fnc>, you will need to run 2962C<make regen_headers> to force a rebuild of F<embed.h> and other 2963auto-generated files. 2964 2965=head2 Formatted Printing of IVs, UVs, and NVs 2966 2967If you are printing IVs, UVs, or NVS instead of the stdio(3) style 2968formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 2969following macros for portability 2970 2971 IVdf IV in decimal 2972 UVuf UV in decimal 2973 UVof UV in octal 2974 UVxf UV in hexadecimal 2975 NVef NV %e-like 2976 NVff NV %f-like 2977 NVgf NV %g-like 2978 2979These will take care of 64-bit integers and long doubles. 2980For example: 2981 2982 printf("IV is %" IVdf "\n", iv); 2983 2984The C<IVdf> will expand to whatever is the correct format for the IVs. 2985Note that the spaces are required around the format in case the code is 2986compiled with C++, to maintain compliance with its standard. 2987 2988Note that there are different "long doubles": Perl will use 2989whatever the compiler has. 2990 2991If you are printing addresses of pointers, use %p or UVxf combined 2992with PTR2UV(). 2993 2994=head2 Formatted Printing of SVs 2995 2996The contents of SVs may be printed using the C<SVf> format, like so: 2997 2998 Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg)) 2999 3000where C<err_msg> is an SV. 3001 3002=for apidoc_section $io_formats 3003=for apidoc Amnh||SVf 3004=for apidoc Amh||SVfARG|SV *sv 3005 3006Not all scalar types are printable. Simple values certainly are: one of 3007IV, UV, NV, or PV. Also, if the SV is a reference to some value, 3008either it will be dereferenced and the value printed, or information 3009about the type of that value and its address are displayed. The results 3010of printing any other type of SV are undefined and likely to lead to an 3011interpreter crash. NVs are printed using a C<%g>-ish format. 3012 3013Note that the spaces are required around the C<SVf> in case the code is 3014compiled with C++, to maintain compliance with its standard. 3015 3016Note that any filehandle being printed to under UTF-8 must be expecting 3017UTF-8 in order to get good results and avoid Wide-character warnings. 3018One way to do this for typical filehandles is to invoke perl with the 3019C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>. 3020 3021You can use this to concatenate two scalars: 3022 3023 SV *var1 = get_sv("var1", GV_ADD); 3024 SV *var2 = get_sv("var2", GV_ADD); 3025 SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, 3026 SVfARG(var1), SVfARG(var2)); 3027 3028=head2 Formatted Printing of Strings 3029 3030If you just want the bytes printed in a 7bit NUL-terminated string, you can 3031just use C<%s> (assuming they are all really only 7bit). But if there is a 3032possibility the value will be encoded as UTF-8 or contains bytes above 3033C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. 3034And as its parameter, use the C<UTF8fARG()> macro: 3035 3036 chr * msg; 3037 3038 /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK 3039 U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ 3040 if (can_utf8) 3041 msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; 3042 else 3043 msg = "'Uses simple quotes'"; 3044 3045 Perl_croak(aTHX_ "The message is: %" UTF8f "\n", 3046 UTF8fARG(can_utf8, strlen(msg), msg)); 3047 3048The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in 3049UTF-8; 0 if string is in native byte encoding (Latin1). 3050The second parameter is the number of bytes in the string to print. 3051And the third and final parameter is a pointer to the first byte in the 3052string. 3053 3054Note that any filehandle being printed to under UTF-8 must be expecting 3055UTF-8 in order to get good results and avoid Wide-character warnings. 3056One way to do this for typical filehandles is to invoke perl with the 3057C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>. 3058 3059=for apidoc_section $io_formats 3060=for apidoc Amnh||UTF8f 3061=for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str 3062 3063=cut 3064 3065=head2 Formatted Printing of C<Size_t> and C<SSize_t> 3066 3067The most general way to do this is to cast them to a UV or IV, and 3068print as in the 3069L<previous section|/Formatted Printing of IVs, UVs, and NVs>. 3070 3071But if you're using C<PerlIO_printf()>, it's less typing and visual 3072clutter to use the C<%z> length modifier (for I<siZe>): 3073 3074 PerlIO_printf("STRLEN is %zu\n", len); 3075 3076This modifier is not portable, so its use should be restricted to 3077C<PerlIO_printf()>. 3078 3079=head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes 3080 3081There are modifiers for these special situations if you are using 3082C<PerlIO_printf()>. See L<perlfunc/size>. 3083 3084=head2 Pointer-To-Integer and Integer-To-Pointer 3085 3086Because pointer size does not necessarily equal integer size, 3087use the follow macros to do it right. 3088 3089 PTR2UV(pointer) 3090 PTR2IV(pointer) 3091 PTR2NV(pointer) 3092 INT2PTR(pointertotype, integer) 3093 3094=for apidoc_section $casting 3095=for apidoc Amh|type|INT2PTR|type|int value 3096=for apidoc Amh|UV|PTR2UV|void * ptr 3097=for apidoc Amh|IV|PTR2IV|void * ptr 3098=for apidoc Amh|NV|PTR2NV|void * ptr 3099 3100For example: 3101 3102 IV iv = ...; 3103 SV *sv = INT2PTR(SV*, iv); 3104 3105and 3106 3107 AV *av = ...; 3108 UV uv = PTR2UV(av); 3109 3110There are also 3111 3112 PTR2nat(pointer) /* pointer to integer of PTRSIZE */ 3113 PTR2ul(pointer) /* pointer to unsigned long */ 3114 3115=for apidoc Amh|IV|PTR2nat|void * 3116=for apidoc Amh|unsigned long|PTR2ul|void * 3117 3118And C<PTRV> which gives the native type for an integer the same size as 3119pointers, such as C<unsigned> or C<unsigned long>. 3120 3121=for apidoc Ayh|type|PTRV 3122 3123=head2 Exception Handling 3124 3125There are a couple of macros to do very basic exception handling in XS 3126modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to 3127be able to use these macros: 3128 3129 #define NO_XSLOCKS 3130 #include "XSUB.h" 3131 3132You can use these macros if you call code that may croak, but you need 3133to do some cleanup before giving control back to Perl. For example: 3134 3135 dXCPT; /* set up necessary variables */ 3136 3137 XCPT_TRY_START { 3138 code_that_may_croak(); 3139 } XCPT_TRY_END 3140 3141 XCPT_CATCH 3142 { 3143 /* do cleanup here */ 3144 XCPT_RETHROW; 3145 } 3146 3147Note that you always have to rethrow an exception that has been 3148caught. Using these macros, it is not possible to just catch the 3149exception and ignore it. If you have to ignore the exception, you 3150have to use the C<call_*> function. 3151 3152The advantage of using the above macros is that you don't have 3153to setup an extra function for C<call_*>, and that using these 3154macros is faster than using C<call_*>. 3155 3156=head2 Source Documentation 3157 3158There's an effort going on to document the internal functions and 3159automatically produce reference manuals from them -- L<perlapi> is one 3160such manual which details all the functions which are available to XS 3161writers. L<perlintern> is the autogenerated manual for the functions 3162which are not part of the API and are supposedly for internal use only. 3163 3164Source documentation is created by putting POD comments into the C 3165source, like this: 3166 3167 /* 3168 =for apidoc sv_setiv 3169 3170 Copies an integer into the given SV. Does not handle 'set' magic. See 3171 L<perlapi/sv_setiv_mg>. 3172 3173 =cut 3174 */ 3175 3176Please try and supply some documentation if you add functions to the 3177Perl core. 3178 3179=head2 Backwards compatibility 3180 3181The Perl API changes over time. New functions are 3182added or the interfaces of existing functions are 3183changed. The C<Devel::PPPort> module tries to 3184provide compatibility code for some of these changes, so XS writers don't 3185have to code it themselves when supporting multiple versions of Perl. 3186 3187C<Devel::PPPort> generates a C header file F<ppport.h> that can also 3188be run as a Perl script. To generate F<ppport.h>, run: 3189 3190 perl -MDevel::PPPort -eDevel::PPPort::WriteFile 3191 3192Besides checking existing XS code, the script can also be used to retrieve 3193compatibility information for various API calls using the C<--api-info> 3194command line switch. For example: 3195 3196 % perl ppport.h --api-info=sv_magicext 3197 3198For details, see S<C<perldoc ppport.h>>. 3199 3200=head1 Unicode Support 3201 3202Perl 5.6.0 introduced Unicode support. It's important for porters and XS 3203writers to understand this support and make sure that the code they 3204write does not corrupt Unicode data. 3205 3206=head2 What B<is> Unicode, anyway? 3207 3208In the olden, less enlightened times, we all used to use ASCII. Most of 3209us did, anyway. The big problem with ASCII is that it's American. Well, 3210no, that's not actually the problem; the problem is that it's not 3211particularly useful for people who don't use the Roman alphabet. What 3212used to happen was that particular languages would stick their own 3213alphabet in the upper range of the sequence, between 128 and 255. Of 3214course, we then ended up with plenty of variants that weren't quite 3215ASCII, and the whole point of it being a standard was lost. 3216 3217Worse still, if you've got a language like Chinese or 3218Japanese that has hundreds or thousands of characters, then you really 3219can't fit them into a mere 256, so they had to forget about ASCII 3220altogether, and build their own systems using pairs of numbers to refer 3221to one character. 3222 3223To fix this, some people formed Unicode, Inc. and 3224produced a new character set containing all the characters you can 3225possibly think of and more. There are several ways of representing these 3226characters, and the one Perl uses is called UTF-8. UTF-8 uses 3227a variable number of bytes to represent a character. You can learn more 3228about Unicode and Perl's Unicode model in L<perlunicode>. 3229 3230(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of 3231UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. 3232UTF-EBCDIC is like UTF-8, but the details are different. The macros 3233hide the differences from you, just remember that the particular numbers 3234and bit patterns presented below will differ in UTF-EBCDIC.) 3235 3236=head2 How can I recognise a UTF-8 string? 3237 3238You can't. This is because UTF-8 data is stored in bytes just like 3239non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 3240capital E with a grave accent, is represented by the two bytes 3241C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 3242has that byte sequence as well. So you can't tell just by looking -- this 3243is what makes Unicode input an interesting problem. 3244 3245In general, you either have to know what you're dealing with, or you 3246have to guess. The API function C<is_utf8_string> can help; it'll tell 3247you if a string contains only valid UTF-8 characters, and the chances 3248of a non-UTF-8 string looking like valid UTF-8 become very small very 3249quickly with increasing string length. On a character-by-character 3250basis, C<isUTF8_CHAR> 3251will tell you whether the current character in a string is valid UTF-8. 3252 3253=head2 How does UTF-8 represent Unicode characters? 3254 3255As mentioned above, UTF-8 uses a variable number of bytes to store a 3256character. Characters with values 0...127 are stored in one 3257byte, just like good ol' ASCII. Character 128 is stored as 3258C<v194.128>; this continues up to character 191, which is 3259C<v194.191>. Now we've run out of bits (191 is binary 3260C<10111111>) so we move on; character 192 is C<v195.128>. And 3261so it goes on, moving to three bytes at character 2048. 3262L<perlunicode/Unicode Encodings> has pictures of how this works. 3263 3264Assuming you know you're dealing with a UTF-8 string, you can find out 3265how long the first character in it is with the C<UTF8SKIP> macro: 3266 3267 char *utf = "\305\233\340\240\201"; 3268 I32 len; 3269 3270 len = UTF8SKIP(utf); /* len is 2 here */ 3271 utf += len; 3272 len = UTF8SKIP(utf); /* len is 3 here */ 3273 3274Another way to skip over characters in a UTF-8 string is to use 3275C<utf8_hop>, which takes a string and a number of characters to skip 3276over. You're on your own about bounds checking, though, so don't use it 3277lightly. 3278 3279All bytes in a multi-byte UTF-8 character will have the high bit set, 3280so you can test if you need to do something special with this 3281character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests 3282whether the byte is encoded as a single byte even in UTF-8): 3283 3284 U8 *utf; /* Initialize this to point to the beginning of the 3285 sequence to convert */ 3286 U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence 3287 pointed to by 'utf' */ 3288 UV uv; /* Returned code point; note: a UV, not a U8, not a 3289 char */ 3290 STRLEN len; /* Returned length of character in bytes */ 3291 3292 if (!UTF8_IS_INVARIANT(*utf)) 3293 /* Must treat this as UTF-8 */ 3294 uv = utf8_to_uvchr_buf(utf, utf_end, &len); 3295 else 3296 /* OK to treat this character as a byte */ 3297 uv = *utf; 3298 3299You can also see in that example that we use C<utf8_to_uvchr_buf> to get the 3300value of the character; the inverse function C<uvchr_to_utf8> is available 3301for putting a UV into UTF-8: 3302 3303 if (!UVCHR_IS_INVARIANT(uv)) 3304 /* Must treat this as UTF8 */ 3305 utf8 = uvchr_to_utf8(utf8, uv); 3306 else 3307 /* OK to treat this character as a byte */ 3308 *utf8++ = uv; 3309 3310You B<must> convert characters to UVs using the above functions if 3311you're ever in a situation where you have to match UTF-8 and non-UTF-8 3312characters. You may not skip over UTF-8 characters in this case. If you 3313do this, you'll lose the ability to match hi-bit non-UTF-8 characters; 3314for instance, if your UTF-8 string contains C<v196.172>, and you skip 3315that character, you can never match a C<chr(200)> in a non-UTF-8 string. 3316So don't do that! 3317 3318(Note that we don't have to test for invariant characters in the 3319examples above. The functions work on any well-formed UTF-8 input. 3320It's just that its faster to avoid the function overhead when it's not 3321needed.) 3322 3323=head2 How does Perl store UTF-8 strings? 3324 3325Currently, Perl deals with UTF-8 strings and non-UTF-8 strings 3326slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the 3327string is internally encoded as UTF-8. Without it, the byte value is the 3328codepoint number and vice versa. This flag is only meaningful if the SV 3329is C<SvPOK> or immediately after stringification via C<SvPV> or a 3330similar macro. You can check and manipulate this flag with the 3331following macros: 3332 3333 SvUTF8(sv) 3334 SvUTF8_on(sv) 3335 SvUTF8_off(sv) 3336 3337This flag has an important effect on Perl's treatment of the string: if 3338UTF-8 data is not properly distinguished, regular expressions, 3339C<length>, C<substr> and other string handling operations will have 3340undesirable (wrong) results. 3341 3342The problem comes when you have, for instance, a string that isn't 3343flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- 3344especially when combining non-UTF-8 and UTF-8 strings. 3345 3346Never forget that the C<SVf_UTF8> flag is separate from the PV value; you 3347need to be sure you don't accidentally knock it off while you're 3348manipulating SVs. More specifically, you cannot expect to do this: 3349 3350 SV *sv; 3351 SV *nsv; 3352 STRLEN len; 3353 char *p; 3354 3355 p = SvPV(sv, len); 3356 frobnicate(p); 3357 nsv = newSVpvn(p, len); 3358 3359The C<char*> string does not tell you the whole story, and you can't 3360copy or reconstruct an SV just by copying the string value. Check if the 3361old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act 3362accordingly: 3363 3364 p = SvPV(sv, len); 3365 is_utf8 = SvUTF8(sv); 3366 frobnicate(p, is_utf8); 3367 nsv = newSVpvn(p, len); 3368 if (is_utf8) 3369 SvUTF8_on(nsv); 3370 3371In the above, your C<frobnicate> function has been changed to be made 3372aware of whether or not it's dealing with UTF-8 data, so that it can 3373handle the string appropriately. 3374 3375Since just passing an SV to an XS function and copying the data of 3376the SV is not enough to copy the UTF8 flags, even less right is just 3377passing a S<C<char *>> to an XS function. 3378 3379For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the 3380string in an SV is to be I<treated> as UTF-8. This takes into account 3381if the call to the XS function is being made from within the scope of 3382L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the 3383UTF-8 string are to be exposed, rather than the character they 3384represent. But this pragma should only really be used for debugging and 3385perhaps low-level testing at the byte level. Hence most XS code need 3386not concern itself with this, but various areas of the perl core do need 3387to support it. 3388 3389And this isn't the whole story. Starting in Perl v5.12, strings that 3390aren't encoded in UTF-8 may also be treated as Unicode under various 3391conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). 3392This is only really a problem for characters whose ordinals are between 3393128 and 255, and their behavior varies under ASCII versus Unicode rules 3394in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). 3395There is no published API for dealing with this, as it is subject to 3396change, but you can look at the code for C<pp_lc> in F<pp.c> for an 3397example as to how it's currently done. 3398 3399=head2 How do I pass a Perl string to a C library? 3400 3401A Perl string, conceptually, is an opaque sequence of code points. 3402Many C libraries expect their inputs to be "classical" C strings, which are 3403arrays of octets 1-255, terminated with a NUL byte. Your job when writing 3404an interface between Perl and a C library is to define the mapping between 3405Perl and that library. 3406 3407Generally speaking, C<SvPVbyte> and related macros suit this task well. 3408These assume that your Perl string is a "byte string", i.e., is either 3409raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8. 3410 3411Alternatively, if your C library expects UTF-8 text, you can use 3412C<SvPVutf8> and related macros. This has the same effect as encoding 3413to UTF-8 then calling the corresponding C<SvPVbyte>-related macro. 3414 3415Some C libraries may expect other encodings (e.g., UTF-16LE). To give 3416Perl strings to such libraries 3417you must either do that encoding in Perl then use C<SvPVbyte>, or 3418use an intermediary C library to convert from however Perl stores the 3419string to the desired encoding. 3420 3421Take care also that NULs in your Perl string don't confuse the C 3422library. If possible, give the string's length to the C library; if that's 3423not possible, consider rejecting strings that contain NUL bytes. 3424 3425=head3 What about C<SvPV>, C<SvPV_nolen>, etc.? 3426 3427Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">. 3428Perl can store these 3 characters either of two ways: 3429 3430=over 3431 3432=item * bytes: 0x64 0x78 0x8c 3433 3434=item * UTF-8: 0x64 0x78 0xc2 0x8c 3435 3436=back 3437 3438Now let's say you convert C<$foo> to a C string thus: 3439 3440 STRLEN strlen; 3441 char *str = SvPV(foo_sv, strlen); 3442 3443At this point C<str> could point to a 3-byte C string or a 4-byte one. 3444 3445Generally speaking, we want C<str> to be the same regardless of how 3446Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte> 3447and C<SvPVutf8> solve that by giving predictable output: use 3448C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8> 3449if it expects UTF-8. 3450 3451If your C library happens to support both encodings, then C<SvPV>--always 3452in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more 3453efficient. 3454 3455B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions 3456in your tests to ensure consistent handling regardless of Perl's 3457internal encoding. 3458 3459=head2 How do I convert a string to UTF-8? 3460 3461If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade 3462the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do 3463this is: 3464 3465 sv_utf8_upgrade(sv); 3466 3467However, you must not do this, for example: 3468 3469 if (!SvUTF8(left)) 3470 sv_utf8_upgrade(left); 3471 3472If you do this in a binary operator, you will actually change one of the 3473strings that came into the operator, and, while it shouldn't be noticeable 3474by the end user, it can cause problems in deficient code. 3475 3476Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 3477string argument. This is useful for having the data available for 3478comparisons and so on, without harming the original SV. There's also 3479C<utf8_to_bytes> to go the other way, but naturally, this will fail if 3480the string contains any characters above 255 that can't be represented 3481in a single byte. 3482 3483=head2 How do I compare strings? 3484 3485L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic 3486comparison of two SV's, and handle UTF-8ness properly. Note, however, 3487that Unicode specifies a much fancier mechanism for collation, available 3488via the L<Unicode::Collate> module. 3489 3490To just compare two strings for equality/non-equality, you can just use 3491L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, 3492except the strings must be both UTF-8 or not UTF-8 encoded. 3493 3494To compare two strings case-insensitively, use 3495L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have 3496the same UTF-8ness). 3497 3498=head2 Is there anything else I need to know? 3499 3500Not really. Just remember these things: 3501 3502=over 3 3503 3504=item * 3505 3506There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 3507or not. But you can tell if an SV is to be treated as UTF-8 by calling 3508C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar 3509macro. And, you can tell if SV is actually UTF-8 (even if it is not to 3510be treated as such) by looking at its C<SvUTF8> flag (again after 3511stringifying it). Don't forget to set the flag if something should be 3512UTF-8. 3513Treat the flag as part of the PV, even though it's not -- if you pass on 3514the PV to somewhere, pass on the flag too. 3515 3516=item * 3517 3518If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, 3519unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 3520 3521=item * 3522 3523When writing a character UV to a UTF-8 string, B<always> use 3524C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case 3525you can use C<*s = uv>. 3526 3527=item * 3528 3529Mixing UTF-8 and non-UTF-8 strings is 3530tricky. Use C<bytes_to_utf8> to get 3531a new string which is UTF-8 encoded, and then combine them. 3532 3533=back 3534 3535=head1 Custom Operators 3536 3537Custom operator support is an experimental feature that allows you to 3538define your own ops. This is primarily to allow the building of 3539interpreters for other languages in the Perl core, but it also allows 3540optimizations through the creation of "macro-ops" (ops which perform the 3541functions of multiple ops which are usually executed together, such as 3542C<gvsv, gvsv, add>.) 3543 3544This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 3545core does not "know" anything special about this op type, and so it will 3546not be involved in any optimizations. This also means that you can 3547define your custom ops to be any op structure -- unary, binary, list and 3548so on -- you like. 3549 3550It's important to know what custom operators won't do for you. They 3551won't let you add new syntax to Perl, directly. They won't even let you 3552add new keywords, directly. In fact, they won't change the way Perl 3553compiles a program at all. You have to do those changes yourself, after 3554Perl has compiled the program. You do this either by manipulating the op 3555tree using a C<CHECK> block and the C<B::Generate> module, or by adding 3556a custom peephole optimizer with the C<optimize> module. 3557 3558When you do this, you replace ordinary Perl ops with custom ops by 3559creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own 3560PP function. This should be defined in XS code, and should look like 3561the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 3562takes the appropriate number of values from the stack, and you are 3563responsible for adding stack marks if necessary. 3564 3565You should also "register" your op with the Perl interpreter so that it 3566can produce sensible error and warning messages. Since it is possible to 3567have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 3568Perl uses the value of C<< o->op_ppaddr >> to determine which custom op 3569it is dealing with. You should create an C<XOP> structure for each 3570ppaddr you use, set the properties of the custom op with 3571C<XopENTRY_set>, and register the structure against the ppaddr using 3572C<Perl_custom_op_register>. A trivial example might look like: 3573 3574=for apidoc_section $optree_manipulation 3575=for apidoc Ayh||XOP 3576 3577 static XOP my_xop; 3578 static OP *my_pp(pTHX); 3579 3580 BOOT: 3581 XopENTRY_set(&my_xop, xop_name, "myxop"); 3582 XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); 3583 Perl_custom_op_register(aTHX_ my_pp, &my_xop); 3584 3585The available fields in the structure are: 3586 3587=over 4 3588 3589=item xop_name 3590 3591A short name for your op. This will be included in some error messages, 3592and will also be returned as C<< $op->name >> by the L<B|B> module, so 3593it will appear in the output of module like L<B::Concise|B::Concise>. 3594 3595=item xop_desc 3596 3597A short description of the function of the op. 3598 3599=item xop_class 3600 3601Which of the various C<*OP> structures this op uses. This should be one of 3602the C<OA_*> constants from F<op.h>, namely 3603 3604=over 4 3605 3606=item OA_BASEOP 3607 3608=item OA_UNOP 3609 3610=item OA_BINOP 3611 3612=item OA_LOGOP 3613 3614=item OA_LISTOP 3615 3616=item OA_PMOP 3617 3618=item OA_SVOP 3619 3620=item OA_PADOP 3621 3622=item OA_PVOP_OR_SVOP 3623 3624This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because 3625the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. 3626 3627=item OA_LOOP 3628 3629=item OA_COP 3630 3631=for apidoc_section $optree_manipulation 3632=for apidoc Amnh||OA_BASEOP 3633=for apidoc_item OA_BINOP 3634=for apidoc_item OA_COP 3635=for apidoc_item OA_LISTOP 3636=for apidoc_item OA_LOGOP 3637=for apidoc_item OA_PADOP 3638=for apidoc_item OA_PMOP 3639=for apidoc_item OA_PVOP_OR_SVOP 3640=for apidoc_item OA_SVOP 3641=for apidoc_item OA_UNOP 3642=for apidoc_item OA_LOOP 3643 3644=back 3645 3646The other C<OA_*> constants should not be used. 3647 3648=item xop_peep 3649 3650This member is of type C<Perl_cpeep_t>, which expands to C<void 3651(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function 3652will be called from C<Perl_rpeep> when ops of this type are encountered 3653by the peephole optimizer. I<o> is the OP that needs optimizing; 3654I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. 3655 3656=for apidoc_section $optree_manipulation 3657=for apidoc Ayh||Perl_cpeep_t 3658 3659=back 3660 3661C<B::Generate> directly supports the creation of custom ops by name. 3662 3663=head1 Stacks 3664 3665Descriptions above occasionally refer to "the stack", but there are in fact 3666many stack-like data structures within the perl interpreter. When otherwise 3667unqualified, "the stack" usually refers to the value stack. 3668 3669The various stacks have different purposes, and operate in slightly different 3670ways. Their differences are noted below. 3671 3672=head2 Value Stack 3673 3674This stack stores the values that regular perl code is operating on, usually 3675intermediate values of expressions within a statement. The stack itself is 3676formed of an array of SV pointers. 3677 3678The base of this stack is pointed to by the interpreter variable 3679C<PL_stack_base>, of type C<SV **>. 3680 3681=for apidoc_section $stack 3682=for apidoc Amnh||PL_stack_base 3683 3684The head of the stack is C<PL_stack_sp>, and points to the most 3685recently-pushed item. 3686 3687=for apidoc Amnh||PL_stack_sp 3688 3689Items are pushed to the stack by using the C<PUSHs()> macro or its variants 3690described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed 3691versions. Note carefully that the non-C<X> versions of these macros do not 3692check the size of the stack and assume it to be big enough. These must be 3693paired with a suitable check of the stack's size, such as the C<EXTEND> macro 3694to ensure it is large enough. For example 3695 3696 EXTEND(SP, 4); 3697 mPUSHi(10); 3698 mPUSHi(20); 3699 mPUSHi(30); 3700 mPUSHi(40); 3701 3702This is slightly more performant than making four separate checks in four 3703separate C<mXPUSHi()> calls. 3704 3705As a further performance optimisation, the various C<PUSH> macros all operate 3706using a local variable C<SP>, rather than the interpreter-global variable 3707C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is 3708normally implied by XSUBs and similar so it is rare you have to consider it 3709directly. Once declared, the C<PUSH> macros will operate only on this local 3710variable, so before invoking any other perl core functions you must use the 3711C<PUTBACK> macro to return the value from the local C<SP> variable back to 3712the interpreter variable. Similarly, after calling a perl core function which 3713may have had reason to move the stack or push/pop values to it, you must use 3714the C<SPAGAIN> macro which refreshes the local C<SP> value back from the 3715interpreter one. 3716 3717Items are popped from the stack by using the C<POPs> macro or its typed 3718versions, There is also a macro C<TOPs> that inspects the topmost item without 3719removing it. 3720 3721=for apidoc_section $stack 3722=for apidoc Amnh||TOPs 3723 3724Note specifically that SV pointers on the value stack do not contribute to the 3725overall reference count of the xVs being referred to. If newly-created xVs are 3726being pushed to the stack you must arrange for them to be destroyed at a 3727suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> 3728to mortalise the xV. 3729 3730=head2 Mark Stack 3731 3732The value stack stores individual perl scalar values as temporaries between 3733expressions. Some perl expressions operate on entire lists; for that purpose 3734we need to know where on the stack each list begins. This is the purpose of the 3735mark stack. 3736 3737The mark stack stores integers as I32 values, which are the height of the 3738value stack at the time before the list began; thus the mark itself actually 3739points to the value stack entry one before the list. The list itself starts at 3740C<mark + 1>. 3741 3742The base of this stack is pointed to by the interpreter variable 3743C<PL_markstack>, of type C<I32 *>. 3744 3745=for apidoc_section $stack 3746=for apidoc Amnh||PL_markstack 3747 3748The head of the stack is C<PL_markstack_ptr>, and points to the most 3749recently-pushed item. 3750 3751=for apidoc Amnh||PL_markstack_ptr 3752 3753Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though 3754the stack itself stores (value) stack indices as integers, the C<PUSHMARK> 3755macro should be given a stack pointer directly; it will calculate the index 3756offset by comparing to the C<PL_stack_sp> variable. Thus almost always the 3757code to perform this is 3758 3759 PUSHMARK(SP); 3760 3761Items are popped from the stack by the C<POPMARK> macro. There is also a macro 3762C<TOPMARK> that inspects the topmost item without removing it. These macros 3763return I32 index values directly. There is also the C<dMARK> macro which 3764declares a new SV double-pointer variable, called C<mark>, which points at the 3765marked stack slot; this is the usual macro that C code will use when operating 3766on lists given on the stack. 3767 3768As noted above, the C<mark> variable itself will point at the most recently 3769pushed value on the value stack before the list begins, and so the list itself 3770starts at C<mark + 1>. The values of the list may be iterated by code such as 3771 3772 for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { 3773 SV *item = *svp; 3774 ... 3775 } 3776 3777Note specifically in the case that the list is already empty, C<mark> will 3778equal C<PL_stack_sp>. 3779 3780Because the C<mark> variable is converted to a pointer on the value stack, 3781extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are 3782invoked within the function, because the stack may need to be moved to 3783extend it and so the existing pointer will now be invalid. If this may be a 3784problem, a possible solution is to track the mark offset as an integer and 3785track the mark itself later on after the stack had been moved. 3786 3787 I32 markoff = POPMARK; 3788 3789 ... 3790 3791 SP **mark = PL_stack_base + markoff; 3792 3793=head2 Temporaries Stack 3794 3795As noted above, xV references on the main value stack do not contribute to the 3796reference count of an xV, and so another mechanism is used to track when 3797temporary values which live on the stack must be released. This is the job of 3798the temporaries stack. 3799 3800The temporaries stack stores pointers to xVs whose reference counts will be 3801decremented soon. 3802 3803The base of this stack is pointed to by the interpreter variable 3804C<PL_tmps_stack>, of type C<SV **>. 3805 3806=for apidoc_section $stack 3807=for apidoc Amnh||PL_tmps_stack 3808 3809The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the 3810index in the array of the most recently-pushed item. 3811 3812=for apidoc Amnh||PL_tmps_ix 3813 3814There is no public API to directly push items to the temporaries stack. Instead, 3815the API function C<sv_2mortal()> is used to mortalize an xV, adding its 3816address to the temporaries stack. 3817 3818Likewise, there is no public API to read values from the temporaries stack. 3819Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS> 3820macro establishes the base levels of the temporaries stack, by capturing the 3821current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous 3822value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of 3823the temporaries that have been pushed since that level are reclaimed. 3824 3825=for apidoc_section $stack 3826=for apidoc Amnh||PL_tmps_floor 3827 3828While it is common to see these two macros in pairs within an C<ENTER>/ 3829C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke 3830C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a 3831loop iterating over elements of a list. While you can invoke C<SAVETMPS> 3832multiple times within a scope pair, it is unlikely to be useful. Subsequent 3833invocations will move the temporaries floor further up, thus effectively 3834trapping the existing temporaries to only be released at the end of the scope. 3835 3836=head2 Save Stack 3837 3838The save stack is used by perl to implement the C<local> keyword and other 3839similar behaviours; any cleanup operations that need to be performed when 3840leaving the current scope. Items pushed to this stack generally capture the 3841current value of some internal variable or state, which will be restored when 3842the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other 3843reasons. 3844 3845Whereas other perl internal stacks store individual items all of the same type 3846(usually SV pointers or integers), the items pushed to the save stack are 3847formed of many different types, having multiple fields to them. For example, 3848the C<SAVEt_INT> type needs to store both the address of the C<int> variable 3849to restore, and the value to restore it to. This information could have been 3850stored using fields of a C<struct>, but would have to be large enough to store 3851three pointers in the largest case, which would waste a lot of space in most 3852of the smaller cases. 3853 3854=for apidoc_section $stack 3855=for apidoc Amnh||SAVEt_INT 3856 3857Instead, the stack stores information in a variable-length encoding of C<ANY> 3858structures. The final value pushed is stored in the C<UV> field which encodes 3859the kind of item held by the preceding items; the count and types of which 3860will depend on what kind of item is being stored. The kind field is pushed 3861last because that will be the first field to be popped when unwinding items 3862from the stack. 3863 3864The base of this stack is pointed to by the interpreter variable 3865C<PL_savestack>, of type C<ANY *>. 3866 3867=for apidoc_section $stack 3868=for apidoc Amnh||PL_savestack 3869 3870The head of the stack is indexed by C<PL_savestack_ix>, an integer which 3871stores the index in the array at which the next item should be pushed. (Note 3872that this is different to most other stacks, which reference the most 3873recently-pushed item). 3874 3875=for apidoc_section $stack 3876=for apidoc Amnh||PL_savestack_ix 3877 3878Items are pushed to the save stack by using the various C<SAVE...()> macros. 3879Many of these macros take a variable and store both its address and current 3880value on the save stack, ensuring that value gets restored on scope exit. 3881 3882 SAVEI8(i8) 3883 SAVEI16(i16) 3884 SAVEI32(i32) 3885 SAVEINT(i) 3886 ... 3887 3888There are also a variety of other special-purpose macros which save particular 3889types or values of interest. C<SAVETMPS> has already been mentioned above. 3890Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to 3891be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to 3892be invoked on scope exit. A full list of such macros can be found in 3893F<scope.h>. 3894 3895There is no public API for popping individual values or items from the save 3896stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way 3897to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will 3898restore all of the saved values that had been pushed since the most recent 3899C<ENTER>. 3900 3901=head2 Scope Stack 3902 3903As with the mark stack to the value stack, the scope stack forms a pair with 3904the save stack. The scope stack stores the height of the save stack at which 3905nested scopes begin, and allows the save stack to be unwound back to that 3906point when the scope is left. 3907 3908When perl is built with debugging enabled, there is a second part to this 3909stack storing human-readable string names describing the type of stack 3910context. Each push operation saves the name as well as the height of the save 3911stack, and each pop operation checks the topmost name with what is expected, 3912causing an assertion failure if the name does not match. 3913 3914The base of this stack is pointed to by the interpreter variable 3915C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are 3916stored in a separate array pointed to by C<PL_scopestack_name>, of type 3917C<const char **>. 3918 3919=for apidoc_section $stack 3920=for apidoc Amnh||PL_scopestack 3921=for apidoc Amnh||PL_scopestack_name 3922 3923The head of the stack is indexed by C<PL_scopestack_ix>, an integer which 3924stores the index of the array or arrays at which the next item should be 3925pushed. (Note that this is different to most other stacks, which reference the 3926most recently-pushed item). 3927 3928=for apidoc_section $stack 3929=for apidoc Amnh||PL_scopestack_ix 3930 3931Values are pushed to the scope stack using the C<ENTER> macro, which begins a 3932new nested scope. Any items pushed to the save stack are then restored at the 3933next nested invocation of the C<LEAVE> macro. 3934 3935=head1 Dynamic Scope and the Context Stack 3936 3937B<Note:> this section describes a non-public internal API that is subject 3938to change without notice. 3939 3940=head2 Introduction to the context stack 3941 3942In Perl, dynamic scoping refers to the runtime nesting of things like 3943subroutine calls, evals etc, as well as the entering and exiting of block 3944scopes. For example, the restoring of a C<local>ised variable is 3945determined by the dynamic scope. 3946 3947Perl tracks the dynamic scope by a data structure called the context 3948stack, which is an array of C<PERL_CONTEXT> structures, and which is 3949itself a big union for all the types of context. Whenever a new scope is 3950entered (such as a block, a C<for> loop, or a subroutine call), a new 3951context entry is pushed onto the stack. Similarly when leaving a block or 3952returning from a subroutine call etc. a context is popped. Since the 3953context stack represents the current dynamic scope, it can be searched. 3954For example, C<next LABEL> searches back through the stack looking for a 3955loop context that matches the label; C<return> pops contexts until it 3956finds a sub or eval context or similar; C<caller> examines sub contexts on 3957the stack. 3958 3959=for apidoc_section $concurrency 3960=for apidoc Cyh||PERL_CONTEXT 3961 3962Each context entry is labelled with a context type, C<cx_type>. Typical 3963context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> 3964and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) 3965and a sort block. The type determines which part of the context union are 3966valid. 3967 3968=for apidoc Cyh ||cx_type 3969 3970=for apidoc Cmnh||CXt_BLOCK 3971=for apidoc_item ||CXt_EVAL 3972=for apidoc_item ||CXt_FORMAT 3973=for apidoc_item ||CXt_GIVEN 3974=for apidoc_item ||CXt_LOOP_ARY 3975=for apidoc_item ||CXt_LOOP_LAZYIV 3976=for apidoc_item ||CXt_LOOP_LAZYSV 3977=for apidoc_item ||CXt_LOOP_LIST 3978=for apidoc_item ||CXt_LOOP_PLAIN 3979=for apidoc_item ||CXt_NULL 3980=for apidoc_item ||CXt_SUB 3981=for apidoc_item ||CXt_SUBST 3982=for apidoc_item ||CXt_WHEN 3983 3984The main division in the context struct is between a substitution scope 3985(C<CXt_SUBST>) and block scopes, which are everything else. The former is 3986just used while executing C<s///e>, and won't be discussed further 3987here. 3988 3989All the block scope types share a common base, which corresponds to 3990C<CXt_BLOCK>. This stores the old values of various scope-related 3991variables like C<PL_curpm>, as well as information about the current 3992scope, such as C<gimme>. On scope exit, the old variables are restored. 3993 3994Particular block scope types store extra per-type information. For 3995example, C<CXt_SUB> stores the currently executing CV, while the various 3996for loop types might hold the original loop variable SV. On scope exit, 3997the per-type data is processed; for example the CV has its reference count 3998decremented, and the original loop variable is restored. 3999 4000The macro C<cxstack> returns the base of the current context stack, while 4001C<cxstack_ix> is the index of the current frame within that stack. 4002 4003=for apidoc_section $concurrency 4004=for apidoc Cmnh|PERL_CONTEXT *|cxstack 4005=for apidoc Cmnh|I32|cxstack_ix 4006 4007In fact, the context stack is actually part of a stack-of-stacks system; 4008whenever something unusual is done such as calling a C<DESTROY> or tie 4009handler, a new stack is pushed, then popped at the end. 4010 4011Note that the API described here changed considerably in perl 5.24; prior 4012to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 4013they were replaced by the inline static functions described below. In 4014addition, the ordering and detail of how these macros/function work 4015changed in many ways, often subtly. In particular they didn't handle 4016saving the savestack and temps stack positions, and required additional 4017C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The 4018old-style macros will not be described further. 4019 4020 4021=head2 Pushing contexts 4022 4023For pushing a new context, the two basic functions are 4024C<cx = cx_pushblock()>, which pushes a new basic context block and returns 4025its address, and a family of similar functions with names like 4026C<cx_pushsub(cx)> which populate the additional type-dependent fields in 4027the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their 4028own push functions, as they don't store any data beyond that pushed by 4029C<cx_pushblock>. 4030 4031The fields of the context struct and the arguments to the C<cx_*> 4032functions are subject to change between perl releases, representing 4033whatever is convenient or efficient for that release. 4034 4035A typical context stack pushing can be found in C<pp_entersub>; the 4036following shows a simplified and stripped-down example of a non-XS call, 4037along with comments showing roughly what each function does. 4038 4039 dMARK; 4040 U8 gimme = GIMME_V; 4041 bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); 4042 OP *retop = PL_op->op_next; 4043 I32 old_ss_ix = PL_savestack_ix; 4044 CV *cv = ....; 4045 4046 /* ... make mortal copies of stack args which are PADTMPs here ... */ 4047 4048 /* ... do any additional savestack pushes here ... */ 4049 4050 /* Now push a new context entry of type 'CXt_SUB'; initially just 4051 * doing the actions common to all block types: */ 4052 4053 cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); 4054 4055 /* this does (approximately): 4056 CXINC; /* cxstack_ix++ (grow if necessary) */ 4057 cx = CX_CUR(); /* and get the address of new frame */ 4058 cx->cx_type = CXt_SUB; 4059 cx->blk_gimme = gimme; 4060 cx->blk_oldsp = MARK - PL_stack_base; 4061 cx->blk_oldsaveix = old_ss_ix; 4062 cx->blk_oldcop = PL_curcop; 4063 cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; 4064 cx->blk_oldscopesp = PL_scopestack_ix; 4065 cx->blk_oldpm = PL_curpm; 4066 cx->blk_old_tmpsfloor = PL_tmps_floor; 4067 4068 PL_tmps_floor = PL_tmps_ix; 4069 */ 4070 4071 4072 /* then update the new context frame with subroutine-specific info, 4073 * such as the CV about to be executed: */ 4074 4075 cx_pushsub(cx, cv, retop, hasargs); 4076 4077 /* this does (approximately): 4078 cx->blk_sub.cv = cv; 4079 cx->blk_sub.olddepth = CvDEPTH(cv); 4080 cx->blk_sub.prevcomppad = PL_comppad; 4081 cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; 4082 cx->blk_sub.retop = retop; 4083 SvREFCNT_inc_simple_void_NN(cv); 4084 */ 4085 4086=for apidoc_section $concurrency 4087=for apidoc Cmnh||CXINC 4088 4089Note that C<cx_pushblock()> sets two new floors: for the args stack (to 4090C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this 4091scope level, every C<nextstate> (amongst others) will reset the args and 4092tmps stack levels to these floors. Note that since C<cx_pushblock> uses 4093the current value of C<PL_tmps_ix> rather than it being passed as an arg, 4094this dictates at what point C<cx_pushblock> should be called. In 4095particular, any new mortals which should be freed only on scope exit 4096(rather than at the next C<nextstate>) should be created first. 4097 4098Most callers of C<cx_pushblock> simply set the new args stack floor to the 4099top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the 4100items being iterated over on the stack, and so sets C<blk_oldsp> to the 4101top of these items instead. Note that, contrary to its name, C<blk_oldsp> 4102doesn't always represent the value to restore C<PL_stack_sp> to on scope 4103exit. 4104 4105Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is 4106later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, 4107this is because, although most values needing saving are stored in fields 4108of the context struct, an extra value needs saving only when the debugger 4109is running, and it doesn't make sense to bloat the struct for this rare 4110case. So instead it is saved on the savestack. Since this value gets 4111calculated and saved before the context is pushed, it is necessary to pass 4112the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the 4113saved value gets freed during scope exit. For most users of 4114C<cx_pushblock>, where nothing needs pushing on the save stack, 4115C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. 4116 4117Note that where possible, values should be saved in the context struct 4118rather than on the save stack; it's much faster that way. 4119 4120Normally C<cx_pushblock> should be immediately followed by the appropriate 4121C<cx_pushfoo>, with nothing between them; this is because if code 4122in-between could die (e.g. a warning upgraded to fatal), then the context 4123stack unwinding code in C<dounwind> would see (in the example above) a 4124C<CXt_SUB> context frame, but without all the subroutine-specific fields 4125set, and crashes would soon ensue. 4126 4127=for apidoc dounwind 4128 4129Where the two must be separate, initially set the type to C<CXt_NULL> or 4130C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the 4131C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's 4132determined which type of loop it's pushing. 4133 4134=head2 Popping contexts 4135 4136Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note 4137however, that unlike C<cx_pushblock>, neither of these functions actually 4138decrement the current context stack index; this is done separately using 4139C<CX_POP()>. 4140 4141=for apidoc_section $concurrency 4142=for apidoc Cmh|void|CX_POP|PERL_CONTEXT* cx 4143 4144There are two main ways that contexts are popped. During normal execution 4145as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and 4146C<pp_leavesub> process and pop just one context using C<cx_popfoo> and 4147C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> 4148may have to pop back several scopes until a sub or loop context is found, 4149and exceptions (such as C<die>) need to pop back contexts until an eval 4150context is found. Both of these are accomplished by C<dounwind()>, which 4151is capable of processing and popping all contexts above the target one. 4152 4153Here is a typical example of context popping, as found in C<pp_leavesub> 4154(simplified slightly): 4155 4156 U8 gimme; 4157 PERL_CONTEXT *cx; 4158 SV **oldsp; 4159 OP *retop; 4160 4161 cx = CX_CUR(); 4162 4163 gimme = cx->blk_gimme; 4164 oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ 4165 4166 if (gimme == G_VOID) 4167 PL_stack_sp = oldsp; 4168 else 4169 leave_adjust_stacks(oldsp, oldsp, gimme, 0); 4170 4171 CX_LEAVE_SCOPE(cx); 4172 cx_popsub(cx); 4173 cx_popblock(cx); 4174 retop = cx->blk_sub.retop; 4175 CX_POP(cx); 4176 4177 return retop; 4178 4179=for apidoc_section $concurrency 4180=for apidoc Cmh||CX_CUR 4181 4182The steps above are in a very specific order, designed to be the reverse 4183order of when the context was pushed. The first thing to do is to copy 4184and/or protect any return arguments and free any temps in the current 4185scope. Scope exits like an rvalue sub normally return a mortal copy of 4186their return args (as opposed to lvalue subs). It is important to make 4187this copy before the save stack is popped or variables are restored, or 4188bad things like the following can happen: 4189 4190 sub f { my $x =...; $x } # $x freed before we get to copy it 4191 sub f { /(...)/; $1 } # PL_curpm restored before $1 copied 4192 4193Although we wish to free any temps at the same time, we have to be careful 4194not to free any temps which are keeping return args alive; nor to free the 4195temps we have just created while mortal copying return args. Fortunately, 4196C<leave_adjust_stacks()> is capable of making mortal copies of return args, 4197shifting args down the stack, and only processing those entries on the 4198temps stack that are safe to do so. 4199 4200In void context no args are returned, so it's more efficient to skip 4201calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op 4202is likely to be imminently called which will do a C<FREETMPS>, so there's 4203no need to do that either. 4204 4205The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just 4206defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the 4207popping, it's possible for perl to call destructors, call C<STORE> to undo 4208localisations of tied vars, and so on. Any of these can die or call 4209C<exit()>. In this case, C<dounwind()> will be called, and the current 4210context stack frame will be re-processed. Thus it is vital that all steps 4211in popping a context are done in such a way to support reentrancy. The 4212other alternative, of decrementing C<cxstack_ix> I<before> processing the 4213frame, would lead to leaks and the like if something died halfway through, 4214or overwriting of the current frame. 4215 4216=for apidoc_section $concurrency 4217=for apidoc Cmh|void|CX_LEAVE_SCOPE|PERL_CONTEXT* cx 4218 4219C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack 4220items have been popped before dying and getting trapped by eval, then the 4221C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where 4222the first one left off. 4223 4224The next step is the type-specific context processing; in this case 4225C<cx_popsub>. In part, this looks like: 4226 4227 cv = cx->blk_sub.cv; 4228 CvDEPTH(cv) = cx->blk_sub.olddepth; 4229 cx->blk_sub.cv = NULL; 4230 SvREFCNT_dec(cv); 4231 4232where its processing the just-executed CV. Note that before it decrements 4233the CV's reference count, it nulls the C<blk_sub.cv>. This means that if 4234it re-enters, the CV won't be freed twice. It also means that you can't 4235rely on such type-specific fields having useful values after the return 4236from C<cx_popfoo>. 4237 4238Next, C<cx_popblock> restores all the various interpreter vars to their 4239previous values or previous high water marks; it expands to: 4240 4241 PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; 4242 PL_scopestack_ix = cx->blk_oldscopesp; 4243 PL_curpm = cx->blk_oldpm; 4244 PL_curcop = cx->blk_oldcop; 4245 PL_tmps_floor = cx->blk_old_tmpsfloor; 4246 4247Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, 4248which value to restore it to depends on the context type (specifically 4249C<for (list) {}>), and what args (if any) it returns; and that will 4250already have been sorted out earlier by C<leave_adjust_stacks()>. 4251 4252Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. 4253After this point, it's possible that that the current context frame could 4254be overwritten by other contexts being pushed. Although things like ties 4255and C<DESTROY> are supposed to work within a new context stack, it's best 4256not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately 4257sets C<cx> to null to detect code that is still relying on the field 4258values in that context frame. Note in the C<pp_leavesub()> example above, 4259we grab C<blk_sub.retop> I<before> calling C<CX_POP>. 4260 4261=head2 Redoing contexts 4262 4263Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> 4264as regards to resetting various vars to their base values. It is used in 4265places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than 4266exiting a scope, we want to re-initialise the scope. As well as resetting 4267C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, 4268C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a 4269C<FREETMPS>. 4270 4271 4272=head1 Slab-based operator allocation 4273 4274B<Note:> this section describes a non-public internal API that is subject 4275to change without notice. 4276 4277Perl's internal error-handling mechanisms implement C<die> (and its internal 4278equivalents) using longjmp. If this occurs during lexing, parsing or 4279compilation, we must ensure that any ops allocated as part of the compilation 4280process are freed. (Older Perl versions did not adequately handle this 4281situation: when failing a parse, they would leak ops that were stored in 4282C C<auto> variables and not linked anywhere else.) 4283 4284To handle this situation, Perl uses I<op slabs> that are attached to the 4285currently-compiling CV. A slab is a chunk of allocated memory. New ops are 4286allocated as regions of the slab. If the slab fills up, a new one is created 4287(and linked from the previous one). When an error occurs and the CV is freed, 4288any ops remaining are freed. 4289 4290Each op is preceded by two pointers: one points to the next op in the slab, and 4291the other points to the slab that owns it. The next-op pointer is needed so 4292that Perl can iterate over a slab and free all its ops. (Op structures are of 4293different sizes, so the slab's ops can't merely be treated as a dense array.) 4294The slab pointer is needed for accessing a reference count on the slab: when 4295the last op on a slab is freed, the slab itself is freed. 4296 4297The slab allocator puts the ops at the end of the slab first. This will tend to 4298allocate the leaves of the op tree first, and the layout will therefore 4299hopefully be cache-friendly. In addition, this means that there's no need to 4300store the size of the slab (see below on why slabs vary in size), because Perl 4301can follow pointers to find the last op. 4302 4303It might seem possible to eliminate slab reference counts altogether, by having 4304all ops implicitly attached to C<PL_compcv> when allocated and freed when the 4305CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and 4306thus free ops faster. But that doesn't work in those cases where ops need to 4307survive beyond their CVs, such as re-evals. 4308 4309The CV also has to have a reference count on the slab. Sometimes the first op 4310created is immediately freed. If the reference count of the slab reaches 0, 4311then it will be freed with the CV still pointing to it. 4312 4313CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count 4314on the slab. When this flag is set, the slab is accessible via C<CvSTART> when 4315C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from 4316C<CvROOT> when it is set. The alternative to this approach of sneaking the slab 4317into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by 4318another pointer. But that would make all CVs larger, even though slab-based op 4319freeing is typically of benefit only for programs that make significant use of 4320string eval. 4321 4322=for apidoc_section $concurrency 4323=for apidoc Cmnh| |CVf_SLABBED 4324=for apidoc_item |OP *|CvROOT|CV * sv 4325=for apidoc_item |OP *|CvSTART|CV * sv 4326 4327When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing 4328the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is 4329assumed that a compilation error has occurred, so the op slab is traversed and 4330all the ops are freed. 4331 4332Under normal circumstances, the CV forgets about its slab (decrementing the 4333reference count) when the root is attached. So the slab reference counting that 4334happens when ops are freed takes care of freeing the slab. In some cases, the 4335CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the 4336ops can survive after the CV is done away with. 4337 4338Forgetting the slab when the root is attached is not strictly necessary, but 4339avoids potential problems with C<CvROOT> being written over. There is code all 4340over the place, both in core and on CPAN, that does things with C<CvROOT>, so 4341forgetting the slab makes things more robust and avoids potential problems. 4342 4343Since the CV takes ownership of its slab when flagged, that flag is never 4344copied when a CV is cloned, as one CV could free a slab that another CV still 4345points to, since forced freeing of ops ignores the reference count (but asserts 4346that it looks right). 4347 4348To avoid slab fragmentation, freed ops are marked as freed and attached to the 4349slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused 4350when possible. Not reusing freed ops would be simpler, but it would result in 4351significantly higher memory usage for programs with large C<if (DEBUG) {...}> 4352blocks. 4353 4354C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause 4355an op to be freed after its CV. If the CV has forcibly freed the ops on its 4356slab and the slab itself, then we will be fiddling with a freed slab. Making 4357C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when 4358there is no compilation error, so the op would never be freed. It holds 4359a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> 4360now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of 4361ops after a compilation error won't free any ops thus marked. 4362 4363Since many pieces of code create tiny subroutines consisting of only a few ops, 4364and since a huge slab would be quite a bit of baggage for those to carry 4365around, the first slab is always very small. To avoid allocating too many 4366slabs for a single CV, each subsequent slab is twice the size of the previous. 4367 4368Smartmatch expects to be able to allocate an op at run time, run it, and then 4369throw it away. For that to work the op is simply malloced when C<PL_compcv> hasn't 4370been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), 4371to distinguish them from malloced ops. 4372 4373 4374=head1 AUTHORS 4375 4376Until May 1997, this document was maintained by Jeff Okamoto 4377E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 4378itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 4379 4380With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 4381Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 4382Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 4383Stephen McCamant, and Gurusamy Sarathy. 4384 4385=head1 SEE ALSO 4386 4387L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> 4388