1=for comment 2The part of this file between =for mg_vtable.pl markers is auto 3generated by mg_vtable.pl; any changes there need to be made instead to 4mg_vtable.pl 5 6=head1 NAME 7 8perlguts - Introduction to the Perl API 9 10=head1 DESCRIPTION 11 12This document attempts to describe how to use the Perl API, as well as 13to provide some info on the basic workings of the Perl core. It is far 14from complete and probably contains many errors. Please refer any 15questions or comments to the author below. 16 17=head1 Variables 18 19=head2 Datatypes 20 21Perl has three typedefs that handle Perl's three main data types: 22 23 SV Scalar Value 24 AV Array Value 25 HV Hash Value 26 27Each typedef has specific routines that manipulate the various data types. 28 29=for apidoc_section $AV 30=for apidoc Ayh||AV 31=for apidoc_section $HV 32=for apidoc Ayh||HV 33=for apidoc_section $SV 34=for apidoc Ayh||SV 35 36=head2 What is an "IV"? 37 38Perl uses a special typedef IV which is a simple signed integer type that is 39guaranteed to be large enough to hold a pointer (as well as an integer). 40Additionally, there is the UV, which is simply an unsigned IV. 41 42Perl also uses several special typedefs to declare variables to hold 43integers of (at least) a given size. 44Use I8, I16, I32, and I64 to declare a signed integer variable which has 45at least as many bits as the number in its name. These all evaluate to 46the native C type that is closest to the given number of bits, but no 47smaller than that number. For example, on many platforms, a C<short> is 4816 bits long, and if so, I16 will evaluate to a C<short>. But on 49platforms where a C<short> isn't exactly 16 bits, Perl will use the 50smallest type that contains 16 bits or more. 51 52U8, U16, U32, and U64 are to declare the corresponding unsigned integer 53types. 54 55If the platform doesn't support 64-bit integers, both I64 and U64 will 56be undefined. Use IV and UV to declare the largest practicable, and 57C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which 58may not be usable in all circumstances. 59 60A numeric constant can be specified with L<perlapi/C<INT16_C>>, 61L<perlapi/C<UINTMAX_C>>, and similar. 62 63=for apidoc_section $integer 64=for apidoc Ayh ||IV 65=for apidoc_item ||I8 66=for apidoc_item ||I16 67=for apidoc_item ||I32 68=for apidoc_item ||I64 69 70=for apidoc Ayh ||UV 71=for apidoc_item ||U8 72=for apidoc_item ||U16 73=for apidoc_item ||U32 74=for apidoc_item ||U64 75 76=head2 Working with SVs 77 78An SV can be created and loaded with one command. There are five types of 79values that can be loaded: an integer value (IV), an unsigned integer 80value (UV), a double (NV), a string (PV), and another scalar (SV). 81("PV" stands for "Pointer Value". You might think that it is misnamed 82because it is described as pointing only to strings. However, it is 83possible to have it point to other things. For example, it could point 84to an array of UVs. But, 85using it for non-strings requires care, as the underlying assumption of 86much of the internals is that PVs are just for strings. Often, for 87example, a trailing C<NUL> is tacked on automatically. The non-string use 88is documented only in this paragraph.) 89 90=for apidoc_section $floating 91=for apidoc Ayh||NV 92 93The seven routines are: 94 95 SV* newSViv(IV); 96 SV* newSVuv(UV); 97 SV* newSVnv(double); 98 SV* newSVpv(const char*, STRLEN); 99 SV* newSVpvn(const char*, STRLEN); 100 SV* newSVpvf(const char*, ...); 101 SV* newSVsv(SV*); 102 103C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in 104F<config.h>) guaranteed to be large enough to represent the size of 105any string that perl can handle. 106 107=for apidoc_section $string 108=for apidoc Ayh||STRLEN 109 110In the unlikely case of a SV requiring more complex initialization, you 111can create an empty SV with newSV(len). If C<len> is 0 an empty SV of 112type NULL is returned, else an SV of type PV is returned with len + 1 (for 113the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases 114the SV has the undef value. 115 116 SV *sv = newSV(0); /* no storage allocated */ 117 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage 118 * allocated */ 119 120To change the value of an I<already-existing> SV, there are eight routines: 121 122 void sv_setiv(SV*, IV); 123 void sv_setuv(SV*, UV); 124 void sv_setnv(SV*, double); 125 void sv_setpv(SV*, const char*); 126 void sv_setpvn(SV*, const char*, STRLEN) 127 void sv_setpvf(SV*, const char*, ...); 128 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, 129 SV **, Size_t, bool *); 130 void sv_setsv(SV*, SV*); 131 132Notice that you can choose to specify the length of the string to be 133assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 134allow Perl to calculate the length by using C<sv_setpv> or by specifying 1350 as the second argument to C<newSVpv>. Be warned, though, that Perl will 136determine the string's length by using C<strlen>, which depends on the 137string terminating with a C<NUL> character, and not otherwise containing 138NULs. 139 140The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 141formatted output becomes the value. 142 143C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 144either a pointer to a variable argument list or the address and length of 145an array of SVs. The last argument points to a boolean; on return, if that 146boolean is true, then locale-specific information has been used to format 147the string, and the string's contents are therefore untrustworthy (see 148L<perlsec>). This pointer may be NULL if that information is not 149important. Note that this function requires you to specify the length of 150the format. 151 152The C<sv_set*()> functions are not generic enough to operate on values 153that have "magic". See L</Magic Virtual Tables> later in this document. 154 155All SVs that contain strings should be terminated with a C<NUL> character. 156If it is not C<NUL>-terminated there is a risk of 157core dumps and corruptions from code which passes the string to C 158functions or system calls which expect a C<NUL>-terminated string. 159Perl's own functions typically add a trailing C<NUL> for this reason. 160Nevertheless, you should be very careful when you pass a string stored 161in an SV to a C function or system call. 162 163To access the actual value that an SV points to, Perl's API exposes 164several macros that coerce the actual scalar type into an IV, UV, double, 165or string: 166 167=over 168 169=item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>) 170 171=item * C<SvNV(SV*)> (C<double>) 172 173=item * Strings are a bit complicated: 174 175=over 176 177=item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)> 178 179If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>. 180 181This is suitable for Perl strings that represent bytes. 182 183=item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)> 184 185If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>. 186 187This is suitable for Perl strings that represent characters. 188 189B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant, 190which means that if the SV contains non-Unicode code points (e.g., 1910x110000), then the result may contain extensions over valid UTF-8. 192See L<perlapi/is_strict_utf8_string> for some methods Perl gives 193you to check the UTF-8 validity of these macros' returns. 194 195=item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)> 196to fetch the SV's raw internal buffer. This is tricky, though; if your Perl 197string 198is C<"\xff\xff">, then depending on the SV's internal encoding you might get 199back a 2-byte B<OR> a 4-byte C<char*>. 200Moreover, if it's the 4-byte string, that could come from either Perl 201C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored 202as raw octets. To differentiate between these you B<MUST> look up the 203SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string 204is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be 205off). 206 207B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or 208similarly-named macros I<without> looking up the SV's UTF8 bit is 209almost certainly a bug if non-ASCII input is allowed. 210 211When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies 212here as for C<SvPVutf8>. 213 214=back 215 216(See L</How do I pass a Perl string to a C library?> for more details.) 217 218In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned 219is placed into the 220variable C<len> (these are macros, so you do I<not> use C<&len>). If you do 221not care what the length of the data is, use C<SvPVbyte_nolen>, 222C<SvPVutf8_nolen>, or C<SvPV_nolen> instead. 223The global variable C<PL_na> can also be given to 224C<SvPVbyte>/C<SvPVutf8>/C<SvPV> 225in this case. But that can be quite inefficient because C<PL_na> must 226be accessed in thread-local storage in threaded Perl. In any case, remember 227that Perl allows arbitrary strings of data that may both contain NULs and 228might not be terminated by a C<NUL>. 229 230Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len), 231len);>. It might work with your 232compiler, but it won't work for everyone. 233Break this sort of statement up into separate assignments: 234 235 SV *s; 236 STRLEN len; 237 char *ptr; 238 ptr = SvPVbyte(s, len); 239 foo(ptr, len); 240 241=back 242 243If you want to know if the scalar value is TRUE, you can use: 244 245 SvTRUE(SV*) 246 247Although Perl will automatically grow strings for you, if you need to force 248Perl to allocate more memory for your SV, you can use the macro 249 250 SvGROW(SV*, STRLEN newlen) 251 252which will determine if more memory needs to be allocated. If so, it will 253call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 254decrease, the allocated memory of an SV and that it does not automatically 255add space for the trailing C<NUL> byte (perl's own string functions typically do 256C<SvGROW(sv, len + 1)>). 257 258If you want to write to an existing SV's buffer and set its value to a 259string, use SvPVbyte_force() or one of its variants to force the SV to be 260a PV. This will remove any of various types of non-stringness from 261the SV while preserving the content of the SV in the PV. This can be 262used, for example, to append data from an API function to a buffer 263without extra copying: 264 265 (void)SvPVbyte_force(sv, len); 266 s = SvGROW(sv, len + needlen + 1); 267 /* something that modifies up to needlen bytes at s+len, but 268 modifies newlen bytes 269 eg. newlen = read(fd, s + len, needlen); 270 ignoring errors for these examples 271 */ 272 s[len + newlen] = '\0'; 273 SvCUR_set(sv, len + newlen); 274 SvUTF8_off(sv); 275 SvSETMAGIC(sv); 276 277If you already have the data in memory or if you want to keep your 278code simple, you can use one of the sv_cat*() variants, such as 279sv_catpvn(). If you want to insert anywhere in the string you can use 280sv_insert() or sv_insert_flags(). 281 282If you don't need the existing content of the SV, you can avoid some 283copying with: 284 285 SvPVCLEAR(sv); 286 s = SvGROW(sv, needlen + 1); 287 /* something that modifies up to needlen bytes at s, but modifies 288 newlen bytes 289 eg. newlen = read(fd, s, needlen); 290 */ 291 s[newlen] = '\0'; 292 SvCUR_set(sv, newlen); 293 SvPOK_only(sv); /* also clears SVf_UTF8 */ 294 SvSETMAGIC(sv); 295 296Again, if you already have the data in memory or want to avoid the 297complexity of the above, you can use sv_setpvn(). 298 299If you have a buffer allocated with Newx() and want to set that as the 300SV's value, you can use sv_usepvn_flags(). That has some requirements 301if you want to avoid perl re-allocating the buffer to fit the trailing 302NUL: 303 304 Newx(buf, somesize+1, char); 305 /* ... fill in buf ... */ 306 buf[somesize] = '\0'; 307 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); 308 /* buf now belongs to perl, don't release it */ 309 310If you have an SV and want to know what kind of data Perl thinks is stored 311in it, you can use the following macros to check the type of SV you have. 312 313 SvIOK(SV*) 314 SvNOK(SV*) 315 SvPOK(SV*) 316 317Be aware that retrieving the numeric value of an SV can set IOK or NOK 318on that SV, even when the SV started as a string. Prior to Perl 3195.36.0 retrieving the string value of an integer could set POK, but 320this can no longer occur. From 5.36.0 this can be used to distinguish 321the original representation of an SV and is intended to make life 322simpler for serializers: 323 324 /* references handled elsewhere */ 325 if (SvIsBOOL(sv)) { 326 /* originally boolean */ 327 ... 328 } 329 else if (SvPOK(sv)) { 330 /* originally a string */ 331 ... 332 } 333 else if (SvNIOK(sv)) { 334 /* originally numeric */ 335 ... 336 } 337 else { 338 /* something special or undef */ 339 } 340 341You can get and set the current length of the string stored in an SV with 342the following macros: 343 344 SvCUR(SV*) 345 SvCUR_set(SV*, I32 val) 346 347You can also get a pointer to the end of the string stored in the SV 348with the macro: 349 350 SvEND(SV*) 351 352But note that these last three macros are valid only if C<SvPOK()> is true. 353 354If you want to append something to the end of string stored in an C<SV*>, 355you can use the following functions: 356 357 void sv_catpv(SV*, const char*); 358 void sv_catpvn(SV*, const char*, STRLEN); 359 void sv_catpvf(SV*, const char*, ...); 360 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, 361 I32, bool); 362 void sv_catsv(SV*, SV*); 363 364The first function calculates the length of the string to be appended by 365using C<strlen>. In the second, you specify the length of the string 366yourself. The third function processes its arguments like C<sprintf> and 367appends the formatted output. The fourth function works like C<vsprintf>. 368You can specify the address and length of an array of SVs instead of the 369va_list argument. The fifth function 370extends the string stored in the first 371SV with the string stored in the second SV. It also forces the second SV 372to be interpreted as a string. 373 374The C<sv_cat*()> functions are not generic enough to operate on values that 375have "magic". See L</Magic Virtual Tables> later in this document. 376 377If you know the name of a scalar variable, you can get a pointer to its SV 378by using the following: 379 380 SV* get_sv("package::varname", 0); 381 382This returns NULL if the variable does not exist. 383 384If you want to know if this variable (or any other SV) is actually C<defined>, 385you can call: 386 387 SvOK(SV*) 388 389The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 390 391Its address can be used whenever an C<SV*> is needed. Make sure that 392you don't try to compare a random sv with C<&PL_sv_undef>. For example 393when interfacing Perl code, it'll work correctly for: 394 395 foo(undef); 396 397But won't work when called as: 398 399 $x = undef; 400 foo($x); 401 402So to repeat always use SvOK() to check whether an sv is defined. 403 404Also you have to be careful when using C<&PL_sv_undef> as a value in 405AVs or HVs (see L</AVs, HVs and undefined values>). 406 407There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 408boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 409addresses can be used whenever an C<SV*> is needed. 410 411Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 412Take this code: 413 414 SV* sv = (SV*) 0; 415 if (I-am-to-return-a-real-value) { 416 sv = sv_2mortal(newSViv(42)); 417 } 418 sv_setsv(ST(0), sv); 419 420This code tries to return a new SV (which contains the value 42) if it should 421return a real value, or undef otherwise. Instead it has returned a NULL 422pointer which, somewhere down the line, will cause a segmentation violation, 423bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 424first line and all will be well. 425 426To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 427call is not necessary (see L</Reference Counts and Mortality>). 428 429=head2 Offsets 430 431Perl provides the function C<sv_chop> to efficiently remove characters 432from the beginning of a string; you give it an SV and a pointer to 433somewhere inside the PV, and it discards everything before the 434pointer. The efficiency comes by means of a little hack: instead of 435actually removing the characters, C<sv_chop> sets the flag C<OOK> 436(offset OK) to signal to other functions that the offset hack is in 437effect, and it moves the PV pointer (called C<SvPVX>) forward 438by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> 439accordingly. (A portion of the space between the old and new PV 440pointers is used to store the count of chopped bytes.) 441 442Hence, at this point, the start of the buffer that we allocated lives 443at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 444into the middle of this allocated storage. 445 446This is best demonstrated by example. Normally copy-on-write will prevent 447the substitution from operator from using this hack, but if you can craft a 448string for which copy-on-write is not possible, you can see it in play. In 449the current implementation, the final byte of a string buffer is used as a 450copy-on-write reference count. If the buffer is not big enough, then 451copy-on-write is skipped. First have a look at an empty string: 452 453 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' 454 SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 455 REFCNT = 1 456 FLAGS = (POK,pPOK) 457 PV = 0x7ffb7bc05b50 ""\0 458 CUR = 0 459 LEN = 10 460 461Notice here the LEN is 10. (It may differ on your platform.) Extend the 462length of the string to one less than 10, and do a substitution: 463 464 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ 465 Dump($a)' 466 SV = PV(0x7ffa04008a70) at 0x7ffa04030390 467 REFCNT = 1 468 FLAGS = (POK,OOK,pPOK) 469 OFFSET = 1 470 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 471 CUR = 8 472 LEN = 9 473 474Here the number of bytes chopped off (1) is shown next as the OFFSET. The 475portion of the string between the "real" and the "fake" beginnings is 476shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 477the fake beginning, not the real one. (The first character of the string 478buffer happens to have changed to "\1" here, not "1", because the current 479implementation stores the offset count in the string buffer. This is 480subject to change.) 481 482Something similar to the offset hack is performed on AVs to enable 483efficient shifting and splicing off the beginning of the array; while 484C<AvARRAY> points to the first element in the array that is visible from 485Perl, C<AvALLOC> points to the real start of the C array. These are 486usually the same, but a C<shift> operation can be carried out by 487increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. 488Again, the location of the real start of the C array only comes into 489play when freeing the array. See C<av_shift> in F<av.c>. 490 491=for apidoc_section $AV 492=for apidoc Amh||AvALLOC|AV* av 493 494=head2 What's Really Stored in an SV? 495 496Recall that the usual method of determining the type of scalar you have is 497to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 498usually these macros will always return TRUE and calling the C<Sv*V> 499macros will do the appropriate conversion of string to integer/double or 500integer/double to string. 501 502If you I<really> need to know if you have an integer, double, or string 503pointer in an SV, you can use the following three macros instead: 504 505 SvIOKp(SV*) 506 SvNOKp(SV*) 507 SvPOKp(SV*) 508 509These will tell you if you truly have an integer, double, or string pointer 510stored in your SV. The "p" stands for private. 511 512There are various ways in which the private and public flags may differ. 513For example, in perl 5.16 and earlier a tied SV may have a valid 514underlying value in the IV slot (so SvIOKp is true), but the data 515should be accessed via the FETCH routine rather than directly, 516so SvIOK is false. (In perl 5.18 onwards, tied scalars use 517the flags the same way as untied scalars.) Another is when 518numeric conversion has occurred and precision has been lost: only the 519private flag is set on 'lossy' values. So when an NV is converted to an 520IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 521 522In general, though, it's best to use the C<Sv*V> macros. 523 524=head2 Working with AVs 525 526There are two main, longstanding ways to create and load an AV. The first 527method creates an empty AV: 528 529 AV* newAV(); 530 531The second method both creates the AV and initially populates it with SVs: 532 533 AV* av_make(SSize_t num, SV **ptr); 534 535The second argument points to an array containing C<num> C<SV*>'s. Once the 536AV has been created, the SVs can be destroyed, if so desired. 537 538Perl v5.36 added two new ways to create an AV and allocate a SV** array 539without populating it. These are more efficient than a newAV() followed by an 540av_extend(). 541 542 /* Creates but does not initialize (Zero) the SV** array */ 543 AV *av = newAV_alloc_x(1); 544 /* Creates and does initialize (Zero) the SV** array */ 545 AV *av = newAV_alloc_xz(1); 546 547The numerical argument refers to the number of array elements to allocate, not 548an array index, and must be >0. The first form must only ever be used when all 549elements will be initialized before any read occurs. Reading a non-initialized 550SV* - i.e. treating a random memory address as a SV* - is a serious bug. 551 552Once the AV has been created, the following operations are possible on it: 553 554 void av_push(AV*, SV*); 555 SV* av_pop(AV*); 556 SV* av_shift(AV*); 557 void av_unshift(AV*, SSize_t num); 558 559These should be familiar operations, with the exception of C<av_unshift>. 560This routine adds C<num> elements at the front of the array with the C<undef> 561value. You must then use C<av_store> (described below) to assign values 562to these new elements. 563 564Here are some other functions: 565 566 SSize_t av_top_index(AV*); 567 SV** av_fetch(AV*, SSize_t key, I32 lval); 568 SV** av_store(AV*, SSize_t key, SV* val); 569 570The C<av_top_index> function returns the highest index value in an array (just 571like $#array in Perl). If the array is empty, -1 is returned. The 572C<av_fetch> function returns the value at index C<key>, but if C<lval> 573is non-zero, then C<av_fetch> will store an undef value at that index. 574The C<av_store> function stores the value C<val> at index C<key>, and does 575not increment the reference count of C<val>. Thus the caller is responsible 576for taking care of that, and if C<av_store> returns NULL, the caller will 577have to decrement the reference count to avoid a memory leak. Note that 578C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 579return value. 580 581A few more: 582 583 void av_clear(AV*); 584 void av_undef(AV*); 585 void av_extend(AV*, SSize_t key); 586 587The C<av_clear> function deletes all the elements in the AV* array, but 588does not actually delete the array itself. The C<av_undef> function will 589delete all the elements in the array plus the array itself. The 590C<av_extend> function extends the array so that it contains at least C<key+1> 591elements. If C<key+1> is less than the currently allocated length of the array, 592then nothing is done. 593 594If you know the name of an array variable, you can get a pointer to its AV 595by using the following: 596 597 AV* get_av("package::varname", 0); 598 599This returns NULL if the variable does not exist. 600 601See L</Understanding the Magic of Tied Hashes and Arrays> for more 602information on how to use the array access functions on tied arrays. 603 604=head3 More efficient working with new or vanilla AVs 605 606Perl v5.36 and v5.38 introduced streamlined, inlined versions of some 607functions: 608 609=over 610 611=item * C<av_store_simple> 612 613=item * C<av_fetch_simple> 614 615=item * C<av_push_simple> 616 617=back 618 619These are drop-in replacements, but can only be used on straightforward 620AVs that meet the following criteria: 621 622=over 623 624=item * are not magical 625 626=item * are not readonly 627 628=item * are "real" (refcounted) AVs 629 630=item * have an av_top_index value > -2 631 632=back 633 634AVs created using C<newAV()>, C<av_make>, C<newAV_alloc_x>, and 635C<newAV_alloc_xz> are all compatible at the time of creation. It is 636only if they are declared readonly or unreal, have magic attached, or 637are otherwise configured unusually that they will stop being compatible. 638 639Note that some interpreter functions may attach magic to an AV as part 640of normal operations. It is therefore safest, unless you are sure of the 641lifecycle of an AV, to only use these new functions close to the point 642of AV creation. 643 644=head2 Working with HVs 645 646To create an HV, you use the following routine: 647 648 HV* newHV(); 649 650Once the HV has been created, the following operations are possible on it: 651 652 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 653 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 654 655The C<klen> parameter is the length of the key being passed in (Note that 656you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 657length of the key). The C<val> argument contains the SV pointer to the 658scalar being stored, and C<hash> is the precomputed hash value (zero if 659you want C<hv_store> to calculate it for you). The C<lval> parameter 660indicates whether this fetch is actually a part of a store operation, in 661which case a new undefined value will be added to the HV with the supplied 662key and C<hv_fetch> will return as if the value had already existed. 663 664Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 665C<SV*>. To access the scalar value, you must first dereference the return 666value. However, you should check to make sure that the return value is 667not NULL before dereferencing it. 668 669The first of these two functions checks if a hash table entry exists, and the 670second deletes it. 671 672 bool hv_exists(HV*, const char* key, U32 klen); 673 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 674 675If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 676create and return a mortal copy of the deleted value. 677 678And more miscellaneous functions: 679 680 void hv_clear(HV*); 681 void hv_undef(HV*); 682 683Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 684table but does not actually delete the hash table. The C<hv_undef> deletes 685both the entries and the hash table itself. 686 687Perl keeps the actual data in a linked list of structures with a typedef of HE. 688These contain the actual key and value pointers (plus extra administrative 689overhead). The key is a string pointer; the value is an C<SV*>. However, 690once you have an C<HE*>, to get the actual key and value, use the routines 691specified below. 692 693=for apidoc_section $HV 694=for apidoc Ayh||HE 695 696 I32 hv_iterinit(HV*); 697 /* Prepares starting point to traverse hash table */ 698 HE* hv_iternext(HV*); 699 /* Get the next entry, and return a pointer to a 700 structure that has both the key and value */ 701 char* hv_iterkey(HE* entry, I32* retlen); 702 /* Get the key from an HE structure and also return 703 the length of the key string */ 704 SV* hv_iterval(HV*, HE* entry); 705 /* Return an SV pointer to the value of the HE 706 structure */ 707 SV* hv_iternextsv(HV*, char** key, I32* retlen); 708 /* This convenience routine combines hv_iternext, 709 hv_iterkey, and hv_iterval. The key and retlen 710 arguments are return values for the key and its 711 length. The value is returned in the SV* argument */ 712 713If you know the name of a hash variable, you can get a pointer to its HV 714by using the following: 715 716 HV* get_hv("package::varname", 0); 717 718This returns NULL if the variable does not exist. 719 720The hash algorithm is defined in the C<PERL_HASH> macro: 721 722 PERL_HASH(hash, key, klen) 723 724The exact implementation of this macro varies by architecture and version 725of perl, and the return value may change per invocation, so the value 726is only valid for the duration of a single perl process. 727 728See L</Understanding the Magic of Tied Hashes and Arrays> for more 729information on how to use the hash access functions on tied hashes. 730 731=for apidoc_section $HV 732=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen 733 734=head2 Hash API Extensions 735 736Beginning with version 5.004, the following functions are also supported: 737 738 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 739 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 740 741 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 742 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 743 744 SV* hv_iterkeysv (HE* entry); 745 746Note that these functions take C<SV*> keys, which simplifies writing 747of extension code that deals with hash structures. These functions 748also allow passing of C<SV*> keys to C<tie> functions without forcing 749you to stringify the keys (unlike the previous set of functions). 750 751They also return and accept whole hash entries (C<HE*>), making their 752use more efficient (since the hash number for a particular string 753doesn't have to be recomputed every time). See L<perlapi> for detailed 754descriptions. 755 756The following macros must always be used to access the contents of hash 757entries. Note that the arguments to these macros must be simple 758variables, since they may get evaluated more than once. See 759L<perlapi> for detailed descriptions of these macros. 760 761 HePV(HE* he, STRLEN len) 762 HeVAL(HE* he) 763 HeHASH(HE* he) 764 HeSVKEY(HE* he) 765 HeSVKEY_force(HE* he) 766 HeSVKEY_set(HE* he, SV* sv) 767 768These two lower level macros are defined, but must only be used when 769dealing with keys that are not C<SV*>s: 770 771 HeKEY(HE* he) 772 HeKLEN(HE* he) 773 774Note that both C<hv_store> and C<hv_store_ent> do not increment the 775reference count of the stored C<val>, which is the caller's responsibility. 776If these functions return a NULL value, the caller will usually have to 777decrement the reference count of C<val> to avoid a memory leak. 778 779=head2 AVs, HVs and undefined values 780 781Sometimes you have to store undefined values in AVs or HVs. Although 782this may be a rare case, it can be tricky. That's because you're 783used to using C<&PL_sv_undef> if you need an undefined SV. 784 785For example, intuition tells you that this XS code: 786 787 AV *av = newAV(); 788 av_store( av, 0, &PL_sv_undef ); 789 790is equivalent to this Perl code: 791 792 my @av; 793 $av[0] = undef; 794 795Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker 796for indicating that an array element has not yet been initialized. 797Thus, C<exists $av[0]> would be true for the above Perl code, but 798false for the array generated by the XS code. In perl 5.20, storing 799&PL_sv_undef will create a read-only element, because the scalar 800&PL_sv_undef itself is stored, not a copy. 801 802Similar problems can occur when storing C<&PL_sv_undef> in HVs: 803 804 hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 805 806This will indeed make the value C<undef>, but if you try to modify 807the value of C<key>, you'll get the following error: 808 809 Modification of non-creatable hash value attempted 810 811In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 812in restricted hashes. This caused such hash entries not to appear 813when iterating over the hash or when checking for the keys 814with the C<hv_exists> function. 815 816You can run into similar problems when you store C<&PL_sv_yes> or 817C<&PL_sv_no> into AVs or HVs. Trying to modify such elements 818will give you the following error: 819 820 Modification of a read-only value attempted 821 822To make a long story short, you can use the special variables 823C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and 824HVs, but you have to make sure you know what you're doing. 825 826Generally, if you want to store an undefined value in an AV 827or HV, you should not use C<&PL_sv_undef>, but rather create a 828new undefined value using the C<newSV> function, for example: 829 830 av_store( av, 42, newSV(0) ); 831 hv_store( hv, "foo", 3, newSV(0), 0 ); 832 833=head2 References 834 835References are a special type of scalar that point to other data types 836(including other references). 837 838To create a reference, use either of the following functions: 839 840 SV* newRV_inc((SV*) thing); 841 SV* newRV_noinc((SV*) thing); 842 843The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 844functions are identical except that C<newRV_inc> increments the reference 845count of the C<thing>, while C<newRV_noinc> does not. For historical 846reasons, C<newRV> is a synonym for C<newRV_inc>. 847 848Once you have a reference, you can use the following macro to dereference 849the reference: 850 851 SvRV(SV*) 852 853then call the appropriate routines, casting the returned C<SV*> to either an 854C<AV*> or C<HV*>, if required. 855 856To determine if an SV is a reference, you can use the following macro: 857 858 SvROK(SV*) 859 860To discover what type of value the reference refers to, use the following 861macro and then check the return value. 862 863 SvTYPE(SvRV(SV*)) 864 865The most useful types that will be returned are: 866 867 SVt_PVAV Array 868 SVt_PVHV Hash 869 SVt_PVCV Code 870 SVt_PVGV Glob (possibly a file handle) 871 872Any numerical value returned which is less than SVt_PVAV will be a scalar 873of some form. 874 875See L<perlapi/svtype> for more details. 876 877=head2 Blessed References and Class Objects 878 879References are also used to support object-oriented programming. In perl's 880OO lexicon, an object is simply a reference that has been blessed into a 881package (or class). Once blessed, the programmer may now use the reference 882to access the various methods in the class. 883 884A reference can be blessed into a package with the following function: 885 886 SV* sv_bless(SV* sv, HV* stash); 887 888The C<sv> argument must be a reference value. The C<stash> argument 889specifies which class the reference will belong to. See 890L</Stashes and Globs> for information on converting class names into stashes. 891 892/* Still under construction */ 893 894The following function upgrades rv to reference if not already one. 895Creates a new SV for rv to point to. If C<classname> is non-null, the SV 896is blessed into the specified class. SV is returned. 897 898 SV* newSVrv(SV* rv, const char* classname); 899 900The following three functions copy integer, unsigned integer or double 901into an SV whose reference is C<rv>. SV is blessed if C<classname> is 902non-null. 903 904 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 905 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 906 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 907 908The following function copies the pointer value (I<the address, not the 909string!>) into an SV whose reference is rv. SV is blessed if C<classname> 910is non-null. 911 912 SV* sv_setref_pv(SV* rv, const char* classname, void* pv); 913 914The following function copies a string into an SV whose reference is C<rv>. 915Set length to 0 to let Perl calculate the string length. SV is blessed if 916C<classname> is non-null. 917 918 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, 919 STRLEN length); 920 921The following function tests whether the SV is blessed into the specified 922class. It does not check inheritance relationships. 923 924 int sv_isa(SV* sv, const char* name); 925 926The following function tests whether the SV is a reference to a blessed object. 927 928 int sv_isobject(SV* sv); 929 930The following function tests whether the SV is derived from the specified 931class. SV can be either a reference to a blessed object or a string 932containing a class name. This is the function implementing the 933C<UNIVERSAL::isa> functionality. 934 935 bool sv_derived_from(SV* sv, const char* name); 936 937To check if you've got an object derived from a specific class you have 938to write: 939 940 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 941 942=head2 Creating New Variables 943 944To create a new Perl variable with an undef value which can be accessed from 945your Perl script, use the following routines, depending on the variable type. 946 947 SV* get_sv("package::varname", GV_ADD); 948 AV* get_av("package::varname", GV_ADD); 949 HV* get_hv("package::varname", GV_ADD); 950 951Notice the use of GV_ADD as the second parameter. The new variable can now 952be set, using the routines appropriate to the data type. 953 954There are additional macros whose values may be bitwise OR'ed with the 955C<GV_ADD> argument to enable certain extra features. Those bits are: 956 957=over 958 959=item GV_ADDMULTI 960 961Marks the variable as multiply defined, thus preventing the: 962 963 Name <varname> used only once: possible typo 964 965warning. 966 967=item GV_ADDWARN 968 969Issues the warning: 970 971 Had to create <varname> unexpectedly 972 973if the variable did not exist before the function was called. 974 975=back 976 977If you do not specify a package name, the variable is created in the current 978package. 979 980=head2 Reference Counts and Mortality 981 982Perl uses a reference count-driven garbage collection mechanism. SVs, 983AVs, or HVs (xV for short in the following) start their life with a 984reference count of 1. If the reference count of an xV ever drops to 0, 985then it will be destroyed and its memory made available for reuse. 986At the most basic internal level, reference counts can be manipulated 987with the following macros: 988 989 int SvREFCNT(SV* sv); 990 SV* SvREFCNT_inc(SV* sv); 991 void SvREFCNT_dec(SV* sv); 992 993(There are also suffixed versions of the increment and decrement macros, 994for situations where the full generality of these basic macros can be 995exchanged for some performance.) 996 997However, the way a programmer should think about references is not so 998much in terms of the bare reference count, but in terms of I<ownership> 999of references. A reference to an xV can be owned by any of a variety 1000of entities: another xV, the Perl interpreter, an XS data structure, 1001a piece of running code, or a dynamic scope. An xV generally does not 1002know what entities own the references to it; it only knows how many 1003references there are, which is the reference count. 1004 1005To correctly maintain reference counts, it is essential to keep track 1006of what references the XS code is manipulating. The programmer should 1007always know where a reference has come from and who owns it, and be 1008aware of any creation or destruction of references, and any transfers 1009of ownership. Because ownership isn't represented explicitly in the xV 1010data structures, only the reference count need be actually maintained 1011by the code, and that means that this understanding of ownership is not 1012actually evident in the code. For example, transferring ownership of a 1013reference from one owner to another doesn't change the reference count 1014at all, so may be achieved with no actual code. (The transferring code 1015doesn't touch the referenced object, but does need to ensure that the 1016former owner knows that it no longer owns the reference, and that the 1017new owner knows that it now does.) 1018 1019An xV that is visible at the Perl level should not become unreferenced 1020and thus be destroyed. Normally, an object will only become unreferenced 1021when it is no longer visible, often by the same means that makes it 1022invisible. For example, a Perl reference value (RV) owns a reference to 1023its referent, so if the RV is overwritten that reference gets destroyed, 1024and the no-longer-reachable referent may be destroyed as a result. 1025 1026Many functions have some kind of reference manipulation as 1027part of their purpose. Sometimes this is documented in terms 1028of ownership of references, and sometimes it is (less helpfully) 1029documented in terms of changes to reference counts. For example, the 1030L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV 1031(with reference count 1) and increment the reference count of the referent 1032that was supplied by the caller. This is best understood as creating 1033a new reference to the referent, which is owned by the created RV, 1034and returning to the caller ownership of the sole reference to the RV. 1035The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not 1036increment the reference count of the referent, but the RV nevertheless 1037ends up owning a reference to the referent. It is therefore implied 1038that the caller of C<newRV_noinc()> is relinquishing a reference to the 1039referent, making this conceptually a more complicated operation even 1040though it does less to the data structures. 1041 1042For example, imagine you want to return a reference from an XSUB 1043function. Inside the XSUB routine, you create an SV which initially 1044has just a single reference, owned by the XSUB routine. This reference 1045needs to be disposed of before the routine is complete, otherwise it 1046will leak, preventing the SV from ever being destroyed. So to create 1047an RV referencing the SV, it is most convenient to pass the SV to 1048C<newRV_noinc()>, which consumes that reference. Now the XSUB routine 1049no longer owns a reference to the SV, but does own a reference to the RV, 1050which in turn owns a reference to the SV. The ownership of the reference 1051to the RV is then transferred by the process of returning the RV from 1052the XSUB. 1053 1054There are some convenience functions available that can help with the 1055destruction of xVs. These functions introduce the concept of "mortality". 1056Much documentation speaks of an xV itself being mortal, but this is 1057misleading. It is really I<a reference to> an xV that is mortal, and it 1058is possible for there to be more than one mortal reference to a single xV. 1059For a reference to be mortal means that it is owned by the temps stack, 1060one of perl's many internal stacks, which will destroy that reference 1061"a short time later". Usually the "short time later" is the end of 1062the current Perl statement. However, it gets more complicated around 1063dynamic scopes: there can be multiple sets of mortal references hanging 1064around at the same time, with different death dates. Internally, the 1065actual determinant for when mortal xV references are destroyed depends 1066on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> 1067and L</Temporaries Stack> below for more details on these macros. 1068 1069Mortal references are mainly used for xVs that are placed on perl's 1070main stack. The stack is problematic for reference tracking, because it 1071contains a lot of xV references, but doesn't own those references: they 1072are not counted. Currently, there are many bugs resulting from xVs being 1073destroyed while referenced by the stack, because the stack's uncounted 1074references aren't enough to keep the xVs alive. So when putting an 1075(uncounted) reference on the stack, it is vitally important to ensure that 1076there will be a counted reference to the same xV that will last at least 1077as long as the uncounted reference. But it's also important that that 1078counted reference be cleaned up at an appropriate time, and not unduly 1079prolong the xV's life. For there to be a mortal reference is often the 1080best way to satisfy this requirement, especially if the xV was created 1081especially to be put on the stack and would otherwise be unreferenced. 1082 1083To create a mortal reference, use the functions: 1084 1085 SV* sv_newmortal() 1086 SV* sv_mortalcopy(SV*) 1087 SV* sv_2mortal(SV*) 1088 1089C<sv_newmortal()> creates an SV (with the undefined value) whose sole 1090reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a 1091copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> 1092mortalises an existing xV reference: it transfers ownership of a reference 1093from the caller to the temps stack. Because C<sv_newmortal> gives the new 1094SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, 1095etc. : 1096 1097 SV *tmp = sv_newmortal(); 1098 sv_setiv(tmp, an_integer); 1099 1100As that is multiple C statements it is quite common so see this idiom instead: 1101 1102 SV *tmp = sv_2mortal(newSViv(an_integer)); 1103 1104The mortal routines are not just for SVs; AVs and HVs can be 1105made mortal by passing their address (type-casted to C<SV*>) to the 1106C<sv_2mortal> or C<sv_mortalcopy> routines. 1107 1108=head2 Stashes and Globs 1109 1110A B<stash> is a hash that contains all variables that are defined 1111within a package. Each key of the stash is a symbol 1112name (shared by all the different types of objects that have the same 1113name), and each value in the hash table is a GV (Glob Value). This GV 1114in turn contains references to the various objects of that name, 1115including (but not limited to) the following: 1116 1117 Scalar Value 1118 Array Value 1119 Hash Value 1120 I/O Handle 1121 Format 1122 Subroutine 1123 1124There is a single stash called C<PL_defstash> that holds the items that exist 1125in the C<main> package. To get at the items in other packages, append the 1126string "::" to the package name. The items in the C<Foo> package are in 1127the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 1128in the stash C<Baz::> in C<Bar::>'s stash. 1129 1130=for apidoc_section $GV 1131=for apidoc Amnh||PL_defstash 1132 1133To get the stash pointer for a particular package, use the function: 1134 1135 HV* gv_stashpv(const char* name, I32 flags) 1136 HV* gv_stashsv(SV*, I32 flags) 1137 1138The first function takes a literal string, the second uses the string stored 1139in the SV. Remember that a stash is just a hash table, so you get back an 1140C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. 1141 1142The name that C<gv_stash*v> wants is the name of the package whose symbol table 1143you want. The default package is called C<main>. If you have multiply nested 1144packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 1145language itself. 1146 1147Alternately, if you have an SV that is a blessed reference, you can find 1148out the stash pointer by using: 1149 1150 HV* SvSTASH(SvRV(SV*)); 1151 1152then use the following to get the package name itself: 1153 1154 char* HvNAME(HV* stash); 1155 1156If you need to bless or re-bless an object you can use the following 1157function: 1158 1159 SV* sv_bless(SV*, HV* stash) 1160 1161where the first argument, an C<SV*>, must be a reference, and the second 1162argument is a stash. The returned C<SV*> can now be used in the same way 1163as any other SV. 1164 1165For more information on references and blessings, consult L<perlref>. 1166 1167=head2 I/O Handles 1168 1169Like AVs and HVs, IO objects are another type of non-scalar SV which 1170may contain input and output L<PerlIO|perlapio> objects or a C<DIR *> 1171from opendir(). 1172 1173You can create a new IO object: 1174 1175 IO* newIO(); 1176 1177Unlike other SVs, a new IO object is automatically blessed into the 1178L<IO::File> class. 1179 1180The IO object contains an input and output PerlIO handle: 1181 1182 PerlIO *IoIFP(IO *io); 1183 PerlIO *IoOFP(IO *io); 1184 1185=for apidoc_section $io 1186=for apidoc Amh|PerlIO *|IoIFP|IO *io 1187=for apidoc Amh|PerlIO *|IoOFP|IO *io 1188 1189Typically if the IO object has been opened on a file, the input handle 1190is always present, but the output handle is only present if the file 1191is open for output. For a file, if both are present they will be the 1192same PerlIO object. 1193 1194Distinct input and output PerlIO objects are created for sockets and 1195character devices. 1196 1197The IO object also contains other data associated with Perl I/O 1198handles: 1199 1200 IV IoLINES(io); /* $. */ 1201 IV IoPAGE(io); /* $% */ 1202 IV IoPAGE_LEN(io); /* $= */ 1203 IV IoLINES_LEFT(io); /* $- */ 1204 char *IoTOP_NAME(io); /* $^ */ 1205 GV *IoTOP_GV(io); /* $^ */ 1206 char *IoFMT_NAME(io); /* $~ */ 1207 GV *IoFMT_GV(io); /* $~ */ 1208 char *IoBOTTOM_NAME(io); 1209 GV *IoBOTTOM_GV(io); 1210 char IoTYPE(io); 1211 U8 IoFLAGS(io); 1212 1213 =for apidoc_sections $io_scn, $formats_section 1214=for apidoc_section $reports 1215=for apidoc Amh|IV|IoLINES|IO *io 1216=for apidoc Amh|IV|IoPAGE|IO *io 1217=for apidoc Amh|IV|IoPAGE_LEN|IO *io 1218=for apidoc Amh|IV|IoLINES_LEFT|IO *io 1219=for apidoc Amh|char *|IoTOP_NAME|IO *io 1220=for apidoc Amh|GV *|IoTOP_GV|IO *io 1221=for apidoc Amh|char *|IoFMT_NAME|IO *io 1222=for apidoc Amh|GV *|IoFMT_GV|IO *io 1223=for apidoc Amh|char *|IoBOTTOM_NAME|IO *io 1224=for apidoc Amh|GV *|IoBOTTOM_GV|IO *io 1225=for apidoc_section $io 1226=for apidoc Amh|char|IoTYPE|IO *io 1227=for apidoc Amh|U8|IoFLAGS|IO *io 1228 1229Most of these are involved with L<formats|perlform>. 1230 1231IoFLAGs() may contain a combination of flags, the most interesting of 1232which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>, 1233settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>. 1234 1235=for apidoc Amnh||IOf_FLUSH 1236=for apidoc Amnh||IOf_UNTAINT 1237 1238The IO object may also contains a directory handle: 1239 1240 DIR *IoDIRP(io); 1241 1242=for apidoc Amh|DIR *|IoDIRP|IO *io 1243 1244suitable for use with PerlDir_read() etc. 1245 1246All of these accessors macros are lvalues, there are no distinct 1247C<_set()> macros to modify the members of the IO object. 1248 1249=head2 Double-Typed SVs 1250 1251Scalar variables normally contain only one type of value, an integer, 1252double, pointer, or reference. Perl will automatically convert the 1253actual scalar data from the stored type into the requested type. 1254 1255Some scalar variables contain more than one type of scalar data. For 1256example, the variable C<$!> contains either the numeric value of C<errno> 1257or its string equivalent from either C<strerror> or C<sys_errlist[]>. 1258 1259To force multiple data values into an SV, you must do two things: use the 1260C<sv_set*v> routines to add the additional scalar type, then set a flag 1261so that Perl will believe it contains more than one type of data. The 1262four macros to set the flags are: 1263 1264 SvIOK_on 1265 SvNOK_on 1266 SvPOK_on 1267 SvROK_on 1268 1269The particular macro you must use depends on which C<sv_set*v> routine 1270you called first. This is because every C<sv_set*v> routine turns on 1271only the bit for the particular type of data being set, and turns off 1272all the rest. 1273 1274For example, to create a new Perl variable called "dberror" that contains 1275both the numeric and descriptive string error values, you could use the 1276following code: 1277 1278 extern int dberror; 1279 extern char *dberror_list; 1280 1281 SV* sv = get_sv("dberror", GV_ADD); 1282 sv_setiv(sv, (IV) dberror); 1283 sv_setpv(sv, dberror_list[dberror]); 1284 SvIOK_on(sv); 1285 1286If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 1287macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 1288 1289=head2 Read-Only Values 1290 1291In Perl 5.16 and earlier, copy-on-write (see the next section) shared a 1292flag bit with read-only scalars. So the only way to test whether 1293C<sv_setsv>, etc., will raise a "Modification of a read-only value" error 1294in those versions is: 1295 1296 SvREADONLY(sv) && !SvIsCOW(sv) 1297 1298Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, 1299and, under 5.20, copy-on-write scalars can also be read-only, so the above 1300check is incorrect. You just want: 1301 1302 SvREADONLY(sv) 1303 1304If you need to do this check often, define your own macro like this: 1305 1306 #if PERL_VERSION >= 18 1307 # define SvTRULYREADONLY(sv) SvREADONLY(sv) 1308 #else 1309 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) 1310 #endif 1311 1312=head2 Copy on Write 1313 1314Perl implements a copy-on-write (COW) mechanism for scalars, in which 1315string copies are not immediately made when requested, but are deferred 1316until made necessary by one or the other scalar changing. This is mostly 1317transparent, but one must take care not to modify string buffers that are 1318shared by multiple SVs. 1319 1320You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. 1321 1322You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). 1323 1324If you want to make the SV drop its string buffer, use 1325C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply 1326C<sv_setsv(sv, NULL)>. 1327 1328All of these functions will croak on read-only scalars (see the previous 1329section for more on those). 1330 1331To test that your code is behaving correctly and not modifying COW buffers, 1332on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with 1333C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations 1334into crashes. You will find it to be marvellously slow, so you may want to 1335skip perl's own tests. 1336 1337=head2 Magic Variables 1338 1339[This section still under construction. Ignore everything here. Post no 1340bills. Everything not permitted is forbidden.] 1341 1342Any SV may be magical, that is, it has special features that a normal 1343SV does not have. These features are stored in the SV structure in a 1344linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 1345 1346 struct magic { 1347 MAGIC* mg_moremagic; 1348 MGVTBL* mg_virtual; 1349 U16 mg_private; 1350 char mg_type; 1351 U8 mg_flags; 1352 I32 mg_len; 1353 SV* mg_obj; 1354 char* mg_ptr; 1355 }; 1356 1357Note this is current as of patchlevel 0, and could change at any time. 1358 1359=head2 Assigning Magic 1360 1361Perl adds magic to an SV using the sv_magic function: 1362 1363 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 1364 1365The C<sv> argument is a pointer to the SV that is to acquire a new magical 1366feature. 1367 1368If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 1369convert C<sv> to type C<SVt_PVMG>. 1370Perl then continues by adding new magic 1371to the beginning of the linked list of magical features. Any prior entry 1372of the same type of magic is deleted. Note that this can be overridden, 1373and multiple instances of the same type of magic can be associated with an 1374SV. 1375 1376The C<name> and C<namlen> arguments are used to associate a string with 1377the magic, typically the name of a variable. C<namlen> is stored in the 1378C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of 1379C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on 1380whether C<namlen> is greater than zero or equal to zero respectively. As a 1381special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed 1382to contain an C<SV*> and is stored as-is with its REFCNT incremented. 1383 1384The sv_magic function uses C<how> to determine which, if any, predefined 1385"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 1386See the L</Magic Virtual Tables> section below. The C<how> argument is also 1387stored in the C<mg_type> field. The value of 1388C<how> should be chosen from the set of macros 1389C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 1390these macros were added, Perl internals used to directly use character 1391literals, so you may occasionally come across old code or documentation 1392referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 1393 1394The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 1395structure. If it is not the same as the C<sv> argument, the reference 1396count of the C<obj> object is incremented. If it is the same, or if 1397the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, 1398C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely 1399stored, without the reference count being incremented. 1400 1401See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic 1402to an SV. 1403 1404There is also a function to add magic to an C<HV>: 1405 1406 void hv_magic(HV *hv, GV *gv, int how); 1407 1408This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 1409 1410To remove the magic from an SV, call the function sv_unmagic: 1411 1412 int sv_unmagic(SV *sv, int type); 1413 1414The C<type> argument should be equal to the C<how> value when the C<SV> 1415was initially made magical. 1416 1417However, note that C<sv_unmagic> removes all magic of a certain C<type> from the 1418C<SV>. If you want to remove only certain 1419magic of a C<type> based on the magic 1420virtual table, use C<sv_unmagicext> instead: 1421 1422 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); 1423 1424=head2 Magic Virtual Tables 1425 1426The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 1427C<MGVTBL>, which is a structure of function pointers and stands for 1428"Magic Virtual Table" to handle the various operations that might be 1429applied to that variable. 1430 1431=for apidoc_section $magic 1432=for apidoc Ayh||MGVTBL 1433 1434The C<MGVTBL> has five (or sometimes eight) pointers to the following 1435routine types: 1436 1437 int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); 1438 int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); 1439 U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); 1440 int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); 1441 int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); 1442 1443 int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, 1444 const char *name, I32 namlen); 1445 int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); 1446 int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); 1447 1448 1449This MGVTBL structure is set at compile-time in F<perl.h> and there are 1450currently 32 types. These different structures contain pointers to various 1451routines that perform additional actions depending on which function is 1452being called. 1453 1454 Function pointer Action taken 1455 ---------------- ------------ 1456 svt_get Do something before the value of the SV is 1457 retrieved. 1458 svt_set Do something after the SV is assigned a value. 1459 svt_len Report on the SV's length. 1460 svt_clear Clear something the SV represents. 1461 svt_free Free any extra storage associated with the SV. 1462 1463 svt_copy copy tied variable magic to a tied element 1464 svt_dup duplicate a magic structure during thread cloning 1465 svt_local copy magic to local value during 'local' 1466 1467For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 1468to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 1469 1470 { magic_get, magic_set, magic_len, 0, 0 } 1471 1472Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 1473if a get operation is being performed, the routine C<magic_get> is 1474called. All the various routines for the various magical types begin 1475with C<magic_>. NOTE: the magic routines are not considered part of 1476the Perl API, and may not be exported by the Perl library. 1477 1478The last three slots are a recent addition, and for source code 1479compatibility they are only checked for if one of the three flags 1480C<MGf_COPY>, C<MGf_DUP>, or C<MGf_LOCAL> is set in mg_flags. 1481This means that most code can continue declaring 1482a vtable as a 5-element value. These three are 1483currently used exclusively by the threading code, and are highly subject 1484to change. 1485 1486=for apidoc_section $magic 1487=for apidoc Amnh||MGf_COPY 1488=for apidoc_item ||MGf_DUP 1489=for apidoc_item ||MGf_LOCAL 1490 1491The current kinds of Magic Virtual Tables are: 1492 1493=for comment 1494This table is generated by regen/mg_vtable.pl. Any changes made here 1495will be lost. 1496 1497=for mg_vtable.pl begin 1498 1499 mg_type 1500 (old-style char and macro) MGVTBL Type of magic 1501 -------------------------- ------ ------------- 1502 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 1503 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1504 % PERL_MAGIC_rhash (none) Extra data for restricted 1505 hashes 1506 * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace 1507 vars 1508 . PERL_MAGIC_pos vtbl_pos pos() lvalue 1509 : PERL_MAGIC_symtab (none) Extra data for symbol 1510 tables 1511 < PERL_MAGIC_backref vtbl_backref For weak ref data 1512 @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV 1513 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore 1514 (fast string search) 1515 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table 1516 (AMT) on stash 1517 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 1518 (@+ and @- vars) 1519 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1520 element 1521 E PERL_MAGIC_env vtbl_env %ENV hash 1522 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1523 f PERL_MAGIC_fm vtbl_regexp Formline 1524 ('compiled' format) 1525 g PERL_MAGIC_regex_global vtbl_mglob m//g target 1526 H PERL_MAGIC_hints vtbl_hints %^H hash 1527 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element 1528 I PERL_MAGIC_isa vtbl_isa @ISA array 1529 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1530 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1531 L PERL_MAGIC_dbfile (none) Debugger %_<filename 1532 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename 1533 element 1534 N PERL_MAGIC_shared (none) Shared between threads 1535 n PERL_MAGIC_shared_scalar (none) Shared between threads 1536 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation 1537 P PERL_MAGIC_tied vtbl_pack Tied array or hash 1538 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1539 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1540 r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex 1541 S PERL_MAGIC_sig vtbl_sig %SIG hash 1542 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1543 t PERL_MAGIC_taint vtbl_taint Taintedness 1544 U PERL_MAGIC_uvar vtbl_uvar Available for use by 1545 extensions 1546 u PERL_MAGIC_uvar_elem (none) Reserved for use by 1547 extensions 1548 V PERL_MAGIC_vstring (none) SV was vstring literal 1549 v PERL_MAGIC_vec vtbl_vec vec() lvalue 1550 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information 1551 X PERL_MAGIC_destruct vtbl_destruct destruct callback 1552 x PERL_MAGIC_substr vtbl_substr substr() lvalue 1553 Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not 1554 exist 1555 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1556 variable / smart parameter 1557 vivification 1558 Z PERL_MAGIC_hook vtbl_hook %{^HOOK} hash 1559 z PERL_MAGIC_hookelem vtbl_hookelem %{^HOOK} hash element 1560 \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference 1561 constructor 1562 ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call 1563 to this CV 1564 ^ PERL_MAGIC_extvalue (none) Value magic available for 1565 use by extensions 1566 ~ PERL_MAGIC_ext (none) Variable magic available 1567 for use by extensions 1568 1569 1570=for apidoc_section $magic 1571=for apidoc AmnhU||PERL_MAGIC_arylen 1572=for apidoc_item ||PERL_MAGIC_arylen_p 1573=for apidoc_item ||PERL_MAGIC_backref 1574=for apidoc_item ||PERL_MAGIC_bm 1575=for apidoc_item ||PERL_MAGIC_checkcall 1576=for apidoc_item ||PERL_MAGIC_collxfrm 1577=for apidoc_item ||PERL_MAGIC_dbfile 1578=for apidoc_item ||PERL_MAGIC_dbline 1579=for apidoc_item ||PERL_MAGIC_debugvar 1580=for apidoc_item ||PERL_MAGIC_defelem 1581=for apidoc_item ||PERL_MAGIC_destruct 1582=for apidoc_item ||PERL_MAGIC_env 1583=for apidoc_item ||PERL_MAGIC_envelem 1584=for apidoc_item ||PERL_MAGIC_ext 1585=for apidoc_item ||PERL_MAGIC_extvalue 1586=for apidoc_item ||PERL_MAGIC_fm 1587=for apidoc_item ||PERL_MAGIC_hints 1588=for apidoc_item ||PERL_MAGIC_hintselem 1589=for apidoc_item ||PERL_MAGIC_hook 1590=for apidoc_item ||PERL_MAGIC_hookelem 1591=for apidoc_item ||PERL_MAGIC_isa 1592=for apidoc_item ||PERL_MAGIC_isaelem 1593=for apidoc_item ||PERL_MAGIC_lvref 1594=for apidoc_item ||PERL_MAGIC_nkeys 1595=for apidoc_item ||PERL_MAGIC_nonelem 1596=for apidoc_item ||PERL_MAGIC_overload_table 1597=for apidoc_item ||PERL_MAGIC_pos 1598=for apidoc_item ||PERL_MAGIC_qr 1599=for apidoc_item ||PERL_MAGIC_regdata 1600=for apidoc_item ||PERL_MAGIC_regdatum 1601=for apidoc_item ||PERL_MAGIC_regex_global 1602=for apidoc_item ||PERL_MAGIC_rhash 1603=for apidoc_item ||PERL_MAGIC_shared 1604=for apidoc_item ||PERL_MAGIC_shared_scalar 1605=for apidoc_item ||PERL_MAGIC_sig 1606=for apidoc_item ||PERL_MAGIC_sigelem 1607=for apidoc_item ||PERL_MAGIC_substr 1608=for apidoc_item ||PERL_MAGIC_sv 1609=for apidoc_item ||PERL_MAGIC_symtab 1610=for apidoc_item ||PERL_MAGIC_taint 1611=for apidoc_item ||PERL_MAGIC_tied 1612=for apidoc_item ||PERL_MAGIC_tiedelem 1613=for apidoc_item ||PERL_MAGIC_tiedscalar 1614=for apidoc_item ||PERL_MAGIC_utf8 1615=for apidoc_item ||PERL_MAGIC_uvar 1616=for apidoc_item ||PERL_MAGIC_uvar_elem 1617=for apidoc_item ||PERL_MAGIC_vec 1618=for apidoc_item ||PERL_MAGIC_vstring 1619 1620=for mg_vtable.pl end 1621 1622When an uppercase and lowercase letter both exist in the table, then the 1623uppercase letter is typically used to represent some kind of composite type 1624(a list or a hash), and the lowercase letter is used to represent an element 1625of that composite type. Some internals code makes use of this case 1626relationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1627 1628The C<PERL_MAGIC_ext>, C<PERL_MAGIC_extvalue> and C<PERL_MAGIC_uvar> magic types 1629are defined specifically for use by extensions and will not be used by perl 1630itself. Extensions can use C<PERL_MAGIC_ext> or C<PERL_MAGIC_extvalue> magic to 1631'attach' private information to variables (typically objects). This is 1632especially useful because there is no way for normal perl code to corrupt this 1633private information (unlike using extra elements of a hash object). 1634C<PERL_MAGIC_extvalue> is value magic (unlike C<PERL_MAGIC_ext> and 1635C<PERL_MAGIC_uvar>) meaning that on localization the new value will not be 1636magical. 1637 1638Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1639C function any time a scalar's value is used or changed. The C<MAGIC>'s 1640C<mg_ptr> field points to a C<ufuncs> structure: 1641 1642 struct ufuncs { 1643 I32 (*uf_val)(pTHX_ IV, SV*); 1644 I32 (*uf_set)(pTHX_ IV, SV*); 1645 IV uf_index; 1646 }; 1647 1648When the SV is read from or written to, the C<uf_val> or C<uf_set> 1649function will be called with C<uf_index> as the first arg and a pointer to 1650the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1651magic is shown below. Note that the ufuncs structure is copied by 1652sv_magic, so you can safely allocate it on the stack. 1653 1654 void 1655 Umagic(sv) 1656 SV *sv; 1657 PREINIT: 1658 struct ufuncs uf; 1659 CODE: 1660 uf.uf_val = &my_get_fn; 1661 uf.uf_set = &my_set_fn; 1662 uf.uf_index = 0; 1663 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1664 1665Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. 1666 1667For hashes there is a specialized hook that gives control over hash 1668keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic 1669if the "set" function in the C<ufuncs> structure is NULL. The hook 1670is activated whenever the hash is accessed with a key specified as 1671an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, 1672C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string 1673through the functions without the C<..._ent> suffix circumvents the 1674hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. 1675 1676Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1677or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1678extra care to avoid conflict. Typically only using the magic on 1679objects blessed into the same class as the extension is sufficient. 1680For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an 1681C<MGVTBL>, even if all its fields will be C<0>, so that individual 1682C<MAGIC> pointers can be identified as a particular kind of magic 1683using their magic virtual table. C<mg_findext> provides an easy way 1684to do that: 1685 1686 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; 1687 1688 MAGIC *mg; 1689 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { 1690 /* this is really ours, not another module's PERL_MAGIC_ext */ 1691 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; 1692 ... 1693 } 1694 1695Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1696earlier do B<not> invoke 'set' magic on their targets. This must 1697be done by the user either by calling the C<SvSETMAGIC()> macro after 1698calling these functions, or by using one of the C<sv_set*_mg()> or 1699C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1700C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1701obtained from external sources in functions that don't handle magic. 1702See L<perlapi> for a description of these functions. 1703For example, calls to the C<sv_cat*()> functions typically need to be 1704followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1705since their implementation handles 'get' magic. 1706 1707=head2 Finding Magic 1708 1709 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that 1710 * type */ 1711 1712This routine returns a pointer to a C<MAGIC> structure stored in the SV. 1713If the SV does not have that magical 1714feature, C<NULL> is returned. If the 1715SV has multiple instances of that magical feature, the first one will be 1716returned. C<mg_findext> can be used 1717to find a C<MAGIC> structure of an SV 1718based on both its magic type and its magic virtual table: 1719 1720 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); 1721 1722Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type 1723SVt_PVMG, Perl may core dump. 1724 1725 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1726 1727This routine checks to see what types of magic C<sv> has. If the mg_type 1728field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1729the mg_type field is changed to be the lowercase letter. 1730 1731=head2 Understanding the Magic of Tied Hashes and Arrays 1732 1733Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1734magic type. 1735 1736WARNING: As of the 5.004 release, proper usage of the array and hash 1737access functions requires understanding a few caveats. Some 1738of these caveats are actually considered bugs in the API, to be fixed 1739in later releases, and are bracketed with [MAYCHANGE] below. If 1740you find yourself actually applying such information in this section, be 1741aware that the behavior may change in the future, umm, without warning. 1742 1743The perl tie function associates a variable with an object that implements 1744the various GET, SET, etc methods. To perform the equivalent of the perl 1745tie function from an XSUB, you must mimic this behaviour. The code below 1746carries out the necessary steps -- firstly it creates a new hash, and then 1747creates a second hash which it blesses into the class which will implement 1748the tie methods. Lastly it ties the two hashes together, and returns a 1749reference to the new tied hash. Note that the code below does NOT call the 1750TIEHASH method in the MyTie class - 1751see L</Calling Perl Routines from within C Programs> for details on how 1752to do this. 1753 1754 SV* 1755 mytie() 1756 PREINIT: 1757 HV *hash; 1758 HV *stash; 1759 SV *tie; 1760 CODE: 1761 hash = newHV(); 1762 tie = newRV_noinc((SV*)newHV()); 1763 stash = gv_stashpv("MyTie", GV_ADD); 1764 sv_bless(tie, stash); 1765 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1766 RETVAL = newRV_noinc(hash); 1767 OUTPUT: 1768 RETVAL 1769 1770The C<av_store> function, when given a tied array argument, merely 1771copies the magic of the array onto the value to be "stored", using 1772C<mg_copy>. It may also return NULL, indicating that the value did not 1773actually need to be stored in the array. [MAYCHANGE] After a call to 1774C<av_store> on a tied array, the caller will usually need to call 1775C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1776TIEARRAY object. If C<av_store> did return NULL, a call to 1777C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1778leak. [/MAYCHANGE] 1779 1780The previous paragraph is applicable verbatim to tied hash access using the 1781C<hv_store> and C<hv_store_ent> functions as well. 1782 1783C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1784C<hv_fetch_ent> actually return an undefined mortal value whose magic 1785has been initialized using C<mg_copy>. Note the value so returned does not 1786need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1787need to call C<mg_get()> on the returned value in order to actually invoke 1788the perl level "FETCH" method on the underlying TIE object. Similarly, 1789you may also call C<mg_set()> on the return value after possibly assigning 1790a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1791method on the TIE object. [/MAYCHANGE] 1792 1793[MAYCHANGE] 1794In other words, the array or hash fetch/store functions don't really 1795fetch and store actual values in the case of tied arrays and hashes. They 1796merely call C<mg_copy> to attach magic to the values that were meant to be 1797"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1798do the job of invoking the TIE methods on the underlying objects. Thus 1799the magic mechanism currently implements a kind of lazy access to arrays 1800and hashes. 1801 1802Currently (as of perl version 5.004), use of the hash and array access 1803functions requires the user to be aware of whether they are operating on 1804"normal" hashes and arrays, or on their tied variants. The API may be 1805changed to provide more transparent access to both tied and normal data 1806types in future versions. 1807[/MAYCHANGE] 1808 1809You would do well to understand that the TIEARRAY and TIEHASH interfaces 1810are mere sugar to invoke some perl method calls while using the uniform hash 1811and array syntax. The use of this sugar imposes some overhead (typically 1812about two to four extra opcodes per FETCH/STORE operation, in addition to 1813the creation of all the mortal variables required to invoke the methods). 1814This overhead will be comparatively small if the TIE methods are themselves 1815substantial, but if they are only a few statements long, the overhead 1816will not be insignificant. 1817 1818=head2 Localizing changes 1819 1820Perl has a very handy construction 1821 1822 { 1823 local $var = 2; 1824 ... 1825 } 1826 1827This construction is I<approximately> equivalent to 1828 1829 { 1830 my $oldvar = $var; 1831 $var = 2; 1832 ... 1833 $var = $oldvar; 1834 } 1835 1836The biggest difference is that the first construction would 1837reinstate the initial value of $var, irrespective of how control exits 1838the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1839more efficient as well. 1840 1841There is a way to achieve a similar task from C via Perl API: create a 1842I<pseudo-block>, and arrange for some changes to be automatically 1843undone at the end of it, either explicit, or via a non-local exit (via 1844die()). A I<block>-like construct is created by a pair of 1845C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1846Such a construct may be created specially for some important localized 1847task, or an existing one (like boundaries of enclosing Perl 1848subroutine/block, or an existing pair for freeing TMPs) may be 1849used. (In the second case the overhead of additional localization must 1850be almost negligible.) Note that any XSUB is automatically enclosed in 1851an C<ENTER>/C<LEAVE> pair. 1852 1853Inside such a I<pseudo-block> the following service is available: 1854 1855=over 4 1856 1857=item C<SAVEINT(int i)> 1858 1859=item C<SAVEIV(IV i)> 1860 1861=item C<SAVEI32(I32 i)> 1862 1863=item C<SAVELONG(long i)> 1864 1865=item C<SAVEI8(I8 i)> 1866 1867=item C<SAVEI16(I16 i)> 1868 1869=item C<SAVEBOOL(int i)> 1870 1871=item C<SAVESTRLEN(STRLEN i)> 1872 1873These macros arrange things to restore the value of integer variable 1874C<i> at the end of the enclosing I<pseudo-block>. 1875 1876=for apidoc_section $callback 1877=for apidoc Amh||SAVEINT|int i 1878=for apidoc Amh||SAVEIV|IV i 1879=for apidoc Amh||SAVEI32|I32 i 1880=for apidoc Amh||SAVELONG|long i 1881=for apidoc Amh||SAVEI8|I8 i 1882=for apidoc Amh||SAVEI16|I16 i 1883=for apidoc Amh||SAVEBOOL|bool i 1884=for apidoc Amh||SAVESTRLEN|STRLEN i 1885 1886=item C<SAVESPTR(s)> 1887 1888=item C<SAVEPPTR(p)> 1889 1890These macros arrange things to restore the value of pointers C<s> and 1891C<p>. C<s> must be a pointer of a type which survives conversion to 1892C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1893and back. 1894 1895=for apidoc Amh||SAVESPTR|SV * s 1896=for apidoc Amh||SAVEPPTR|char * p 1897 1898=item C<SAVERCPV(char **ppv)> 1899 1900This macro arranges to restore the value of a C<char *> variable which 1901was allocated with a call to C<rcpv_new()> to its previous state when 1902the current pseudo block is completed. The pointer stored in C<*ppv> at 1903the time of the call will be refcount incremented and stored on the save 1904stack. Later when the current I<pseudo-block> is completed the value 1905stored in C<*ppv> will be refcount decremented, and the previous value 1906restored from the savestack which will also be refcount decremented. 1907 1908This is the C<RCPV> equivalent of C<SAVEGENERICSV()>. 1909 1910=for apidoc Amh||SAVERCPV|char *pv 1911 1912=item C<SAVEGENERICSV(SV **psv)> 1913 1914This macro arranges to restore the value of a C<SV *> variable to its 1915previous state when the current pseudo block is completed. The pointer 1916stored in C<*psv> at the time of the call will be refcount incremented 1917and stored on the save stack. Later when the current I<pseudo-block> is 1918completed the value stored in C<*ppv> will be refcount decremented, and 1919the previous value restored from the savestack which will also be refcount 1920decremented. This the C equivalent of C<local $sv>. 1921 1922=for apidoc Amh||SAVEGENERICSV|char **psv 1923 1924=item C<SAVEFREESV(SV *sv)> 1925 1926The refcount of C<sv> will be decremented at the end of 1927I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1928mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1929extends the lifetime of C<sv> until the beginning of the next statement, 1930C<SAVEFREESV> extends it until the end of the enclosing scope. These 1931lifetimes can be wildly different. 1932 1933Also compare C<SAVEMORTALIZESV>. 1934 1935=for apidoc Amh||SAVEFREESV|SV* sv 1936 1937=item C<SAVEMORTALIZESV(SV *sv)> 1938 1939Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1940scope instead of decrementing its reference count. This usually has the 1941effect of keeping C<sv> alive until the statement that called the currently 1942live scope has finished executing. 1943 1944=for apidoc Amh||SAVEMORTALIZESV|SV* sv 1945 1946=item C<SAVEFREEOP(OP *op)> 1947 1948The C<OP *> is C<op_free()>ed at the end of I<pseudo-block>. 1949 1950=for apidoc Amh||SAVEFREEOP|OP *op 1951 1952=item C<SAVEFREEPV(p)> 1953 1954The chunk of memory which is pointed to by C<p> is C<Safefree()>ed at the 1955end of the current I<pseudo-block>. 1956 1957=for apidoc Amh||SAVEFREEPV|char *pv 1958 1959=item C<SAVEFREERCPV(char *pv)> 1960 1961Ensures that a C<char *> which was created by a call to C<rcpv_new()> is 1962C<rcpv_free()>ed at the end of the current I<pseudo-block>. 1963 1964This is the RCPV equivalent of C<SAVEFREESV()>. 1965 1966=for apidoc Amh||SAVEFREERCPV|char *pv 1967 1968=item C<SAVECLEARSV(SV *sv)> 1969 1970Clears a slot in the current scratchpad which corresponds to C<sv> at 1971the end of I<pseudo-block>. 1972 1973=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1974 1975The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1976string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1977short-lived storage, the corresponding string may be reallocated like 1978this: 1979 1980 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1981 1982=for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length 1983 1984=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1985 1986At the end of I<pseudo-block> the function C<f> is called with the 1987only argument C<p> which may be NULL. 1988 1989=for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t 1990=for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p 1991 1992=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1993 1994At the end of I<pseudo-block> the function C<f> is called with the 1995implicit context argument (if any), and C<p> which may be NULL. 1996 1997Note the I<end of the current pseudo-block> may occur much later than 1998the I<end of the current statement>. You may wish to look at the 1999C<MORTALDESTRUCTOR_X()> macro instead. 2000 2001=for apidoc Ayh||DESTRUCTORFUNC_t 2002=for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p 2003 2004=item C<MORTALSVFUNC_X(SVFUNC_t f, SV *sv)> 2005 2006At the end of I<the current statement> the function C<f> is called with 2007the implicit context argument (if any), and C<sv> which may be NULL. 2008 2009Be aware that the parameter argument to the destructor function differs 2010from the related C<SAVEDESTRUCTOR_X()> in that it MUST be either NULL or 2011an C<SV*>. 2012 2013Note the I<end of the current statement> may occur much before the 2014the I<end of the current pseudo-block>. You may wish to look at the 2015C<SAVEDESTRUCTOR_X()> macro instead. 2016 2017=for apidoc Amh||MORTALDESTRUCTOR_X|DESTRUCTORFUNC_t f|SV *sv 2018 2019=item C<MORTALDESTRUCTOR_SV(SV *coderef, SV *args)> 2020 2021At the end of I<the current statement> the Perl function contained in 2022C<coderef> is called with the arguments provided (if any) in C<args>. 2023See the documentation for C<mortal_destructor_sv()> for details on 2024the C<args> parameter is handled. 2025 2026Note the I<end of the current statement> may occur much before the 2027the I<end of the current pseudo-block>. If you wish to call a perl 2028function at the end of the current pseudo block you should use the 2029C<SAVEDESTRUCTOR_X()> API instead, which will require you create a 2030C wrapper to call the Perl function. 2031 2032=for apidoc Amh||MORTALDESTRUCTOR_SV|SV *coderef|SV *args 2033 2034=item C<SAVESTACK_POS()> 2035 2036The current offset on the Perl internal stack (cf. C<SP>) is restored 2037at the end of I<pseudo-block>. 2038 2039=for apidoc Amh||SAVESTACK_POS 2040 2041=back 2042 2043The following API list contains functions, thus one needs to 2044provide pointers to the modifiable data explicitly (either C pointers, 2045or Perlish C<GV *>s). Where the above macros take C<int>, a similar 2046function takes C<int *>. 2047 2048Other macros above have functions implementing them, but its probably 2049best to just use the macro, and not those or the ones below. 2050 2051=over 4 2052 2053=item C<SV* save_scalar(GV *gv)> 2054 2055=for apidoc save_scalar 2056 2057Equivalent to Perl code C<local $gv>. 2058 2059=item C<AV* save_ary(GV *gv)> 2060 2061=for apidoc save_ary 2062 2063=item C<HV* save_hash(GV *gv)> 2064 2065=for apidoc save_hash 2066 2067Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 2068 2069=item C<void save_item(SV *item)> 2070 2071=for apidoc save_item 2072 2073Duplicates the current value of C<SV>. On the exit from the current 2074C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored 2075using the stored value. It doesn't handle magic. Use C<save_scalar> if 2076magic is affected. 2077 2078=item C<SV* save_svref(SV **sptr)> 2079 2080=for apidoc save_svref 2081 2082Similar to C<save_scalar>, but will reinstate an C<SV *>. 2083 2084=item C<void save_aptr(AV **aptr)> 2085 2086=item C<void save_hptr(HV **hptr)> 2087 2088=for apidoc save_aptr 2089=for apidoc save_hptr 2090 2091Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 2092 2093=back 2094 2095The C<Alias> module implements localization of the basic types within the 2096I<caller's scope>. People who are interested in how to localize things in 2097the containing scope should take a look there too. 2098 2099=head1 Subroutines 2100 2101=head2 XSUBs and the Argument Stack 2102 2103The XSUB mechanism is a simple way for Perl programs to access C subroutines. 2104An XSUB routine will have a stack that contains the arguments from the Perl 2105program, and a way to map from the Perl data structures to a C equivalent. 2106 2107The stack arguments are accessible through the C<ST(n)> macro, which returns 2108the C<n>'th stack argument. Argument 0 is the first argument passed in the 2109Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 2110an C<SV*> is used. 2111 2112Most of the time, output from the C routine can be handled through use of 2113the RETVAL and OUTPUT directives. However, there are some cases where the 2114argument stack is not already long enough to handle all the return values. 2115An example is the POSIX tzname() call, which takes no arguments, but returns 2116two, the local time zone's standard and summer time abbreviations. 2117 2118To handle this situation, the PPCODE directive is used and the stack is 2119extended using the macro: 2120 2121 EXTEND(SP, num); 2122 2123where C<SP> is the macro that represents the local copy of the stack pointer, 2124and C<num> is the number of elements the stack should be extended by. 2125 2126Now that there is room on the stack, values can be pushed on it using C<PUSHs> 2127macro. The pushed values will often need to be "mortal" (See 2128L</Reference Counts and Mortality>): 2129 2130 PUSHs(sv_2mortal(newSViv(an_integer))) 2131 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) 2132 PUSHs(sv_2mortal(newSVnv(a_double))) 2133 PUSHs(sv_2mortal(newSVpv("Some String",0))) 2134 /* Although the last example is better written as the more 2135 * efficient: */ 2136 PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) 2137 2138And now the Perl program calling C<tzname>, the two values will be assigned 2139as in: 2140 2141 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 2142 2143An alternate (and possibly simpler) method to pushing values on the stack is 2144to use the macro: 2145 2146 XPUSHs(SV*) 2147 2148This macro automatically adjusts the stack for you, if needed. Thus, you 2149do not need to call C<EXTEND> to extend the stack. 2150 2151Despite their suggestions in earlier versions of this document the macros 2152C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. 2153For that, either stick to the C<(X)PUSHs> macros shown above, or use the new 2154C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. 2155 2156For more information, consult L<perlxs> and L<perlxstut>. 2157 2158=head2 Autoloading with XSUBs 2159 2160If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the 2161fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable 2162of the XSUB's package. 2163 2164But it also puts the same information in certain fields of the XSUB itself: 2165 2166 HV *stash = CvSTASH(cv); 2167 const char *subname = SvPVX(cv); 2168 STRLEN name_length = SvCUR(cv); /* in bytes */ 2169 U32 is_utf8 = SvUTF8(cv); 2170 2171C<SvPVX(cv)> contains just the sub name itself, not including the package. 2172For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, 2173C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. 2174 2175B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support 2176XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the 2177XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need 2178to support 5.8-5.14, use the XSUB's fields. 2179 2180=head2 Calling Perl Routines from within C Programs 2181 2182There are four routines that can be used to call a Perl subroutine from 2183within a C program. These four are: 2184 2185 I32 call_sv(SV*, I32); 2186 I32 call_pv(const char*, I32); 2187 I32 call_method(const char*, I32); 2188 I32 call_argv(const char*, I32, char**); 2189 2190The routine most often used is C<call_sv>. The C<SV*> argument 2191contains either the name of the Perl subroutine to be called, or a 2192reference to the subroutine. The second argument consists of flags 2193that control the context in which the subroutine is called, whether 2194or not the subroutine is being passed arguments, how errors should be 2195trapped, and how to treat return values. 2196 2197All four routines return the number of arguments that the subroutine returned 2198on the Perl stack. 2199 2200These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 2201but those names are now deprecated; macros of the same name are provided for 2202compatibility. 2203 2204When using any of these routines (except C<call_argv>), the programmer 2205must manipulate the Perl stack. These include the following macros and 2206functions: 2207 2208 dSP 2209 SP 2210 PUSHMARK() 2211 PUTBACK 2212 SPAGAIN 2213 ENTER 2214 SAVETMPS 2215 FREETMPS 2216 LEAVE 2217 XPUSH*() 2218 POP*() 2219 2220For a detailed description of calling conventions from C to Perl, 2221consult L<perlcall>. 2222 2223=head2 Putting a C value on Perl stack 2224 2225A lot of opcodes (this is an elementary operation in the internal perl 2226stack machine) put an SV* on the stack. However, as an optimization 2227the corresponding SV is (usually) not recreated each time. The opcodes 2228reuse specially assigned SVs (I<target>s) which are (as a corollary) 2229not constantly freed/created. 2230 2231Each of the targets is created only once (but see 2232L</Scratchpads and recursion> below), and when an opcode needs to put 2233an integer, a double, or a string on the stack, it just sets the 2234corresponding parts of its I<target> and puts the I<target> on stack. 2235 2236The macro to put this target on stack is C<PUSHTARG>, and it is 2237directly used in some opcodes, as well as indirectly in zillions of 2238others, which use it via C<(X)PUSH[iunp]>. 2239 2240Because the target is reused, you must be careful when pushing multiple 2241values on the stack. The following code will not do what you think: 2242 2243 XPUSHi(10); 2244 XPUSHi(20); 2245 2246This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 2247the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 2248At the end of the operation, the stack does not contain the values 10 2249and 20, but actually contains two pointers to C<TARG>, which we have set 2250to 20. 2251 2252If you need to push multiple different values then you should either use 2253the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, 2254none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an 2255SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, 2256will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make 2257this a little easier to achieve by creating a new mortal for you (via 2258C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary 2259in the case of the C<mXPUSH[iunp]> macros), and then setting its value. 2260Thus, instead of writing this to "fix" the example above: 2261 2262 XPUSHs(sv_2mortal(newSViv(10))) 2263 XPUSHs(sv_2mortal(newSViv(20))) 2264 2265you can simply write: 2266 2267 mXPUSHi(10) 2268 mXPUSHi(20) 2269 2270On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to 2271need a C<dTARG> in your variable declarations so that the C<*PUSH*> 2272macros can make use of the local variable C<TARG>. See also 2273C<dTARGET> and C<dXSTARG>. 2274 2275=head2 Scratchpads 2276 2277The question remains on when the SVs which are I<target>s for opcodes 2278are created. The answer is that they are created when the current 2279unit--a subroutine or a file (for opcodes for statements outside of 2280subroutines)--is compiled. During this time a special anonymous Perl 2281array is created, which is called a scratchpad for the current unit. 2282 2283A scratchpad keeps SVs which are lexicals for the current unit and are 2284targets for opcodes. A previous version of this document 2285stated that one can deduce that an SV lives on a scratchpad 2286by looking on its flags: lexicals have C<SVs_PADMY> set, and 2287I<target>s have C<SVs_PADTMP> set. But this has never been fully true. 2288C<SVs_PADMY> could be set on a variable that no longer resides in any pad. 2289While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables 2290that have never resided in a pad, but nonetheless act like I<target>s. As 2291of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as 22920. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. 2293 2294=for apidoc_section $pad 2295=for apidoc Amnh||SVs_PADTMP 2296=for apidoc AmnhD||SVs_PADMY 2297 2298The correspondence between OPs and I<target>s is not 1-to-1. Different 2299OPs in the compile tree of the unit can use the same target, if this 2300would not conflict with the expected life of the temporary. 2301 2302=head2 Scratchpads and recursion 2303 2304In fact it is not 100% true that a compiled unit contains a pointer to 2305the scratchpad AV. In fact it contains a pointer to an AV of 2306(initially) one element, and this element is the scratchpad AV. Why do 2307we need an extra level of indirection? 2308 2309The answer is B<recursion>, and maybe B<threads>. Both 2310these can create several execution pointers going into the same 2311subroutine. For the subroutine-child not write over the temporaries 2312for the subroutine-parent (lifespan of which covers the call to the 2313child), the parent and the child should have different 2314scratchpads. (I<And> the lexicals should be separate anyway!) 2315 2316So each subroutine is born with an array of scratchpads (of length 1). 2317On each entry to the subroutine it is checked that the current 2318depth of the recursion is not more than the length of this array, and 2319if it is, new scratchpad is created and pushed into the array. 2320 2321The I<target>s on this scratchpad are C<undef>s, but they are already 2322marked with correct flags. 2323 2324=head1 Memory Allocation 2325 2326=head2 Allocation 2327 2328All memory meant to be used with the Perl API functions should be manipulated 2329using the macros described in this section. The macros provide the necessary 2330transparency between differences in the actual malloc implementation that is 2331used within perl. 2332 2333The following three macros are used to initially allocate memory : 2334 2335 Newx(pointer, number, type); 2336 Newxc(pointer, number, type, cast); 2337 Newxz(pointer, number, type); 2338 2339The first argument C<pointer> should be the name of a variable that will 2340point to the newly allocated memory. 2341 2342The second and third arguments C<number> and C<type> specify how many of 2343the specified type of data structure should be allocated. The argument 2344C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, 2345should be used if the C<pointer> argument is different from the C<type> 2346argument. 2347 2348Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> 2349to zero out all the newly allocated memory. 2350 2351=head2 Reallocation 2352 2353 Renew(pointer, number, type); 2354 Renewc(pointer, number, type, cast); 2355 Safefree(pointer) 2356 2357These three macros are used to change a memory buffer size or to free a 2358piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 2359match those of C<New> and C<Newc> with the exception of not needing the 2360"magic cookie" argument. 2361 2362=head2 Moving 2363 2364 Move(source, dest, number, type); 2365 Copy(source, dest, number, type); 2366 Zero(dest, number, type); 2367 2368These three macros are used to move, copy, or zero out previously allocated 2369memory. The C<source> and C<dest> arguments point to the source and 2370destination starting points. Perl will move, copy, or zero out C<number> 2371instances of the size of the C<type> data structure (using the C<sizeof> 2372function). 2373 2374=head1 PerlIO 2375 2376The most recent development releases of Perl have been experimenting with 2377removing Perl's dependency on the "normal" standard I/O suite and allowing 2378other stdio implementations to be used. This involves creating a new 2379abstraction layer that then calls whichever implementation of stdio Perl 2380was compiled with. All XSUBs should now use the functions in the PerlIO 2381abstraction layer and not make any assumptions about what kind of stdio 2382is being used. 2383 2384For a complete description of the PerlIO abstraction, consult L<perlapio>. 2385 2386=head1 Compiled code 2387 2388=head2 Code tree 2389 2390Here we describe the internal form your code is converted to by 2391Perl. Start with a simple example: 2392 2393 $a = $b + $c; 2394 2395This is converted to a tree similar to this one: 2396 2397 assign-to 2398 / \ 2399 + $a 2400 / \ 2401 $b $c 2402 2403(but slightly more complicated). This tree reflects the way Perl 2404parsed your code, but has nothing to do with the execution order. 2405There is an additional "thread" going through the nodes of the tree 2406which shows the order of execution of the nodes. In our simplified 2407example above it looks like: 2408 2409 $b ---> $c ---> + ---> $a ---> assign-to 2410 2411But with the actual compile tree for C<$a = $b + $c> it is different: 2412some nodes I<optimized away>. As a corollary, though the actual tree 2413contains more nodes than our simplified example, the execution order 2414is the same as in our example. 2415 2416=head2 Examining the tree 2417 2418If you have your perl compiled for debugging (usually done with 2419C<-DDEBUGGING> on the C<Configure> command line), you may examine the 2420compiled tree by specifying C<-Dx> on the Perl command line. The 2421output takes several lines per node, and for C<$b+$c> it looks like 2422this: 2423 2424 5 TYPE = add ===> 6 2425 TARG = 1 2426 FLAGS = (SCALAR,KIDS) 2427 { 2428 TYPE = null ===> (4) 2429 (was rv2sv) 2430 FLAGS = (SCALAR,KIDS) 2431 { 2432 3 TYPE = gvsv ===> 4 2433 FLAGS = (SCALAR) 2434 GV = main::b 2435 } 2436 } 2437 { 2438 TYPE = null ===> (5) 2439 (was rv2sv) 2440 FLAGS = (SCALAR,KIDS) 2441 { 2442 4 TYPE = gvsv ===> 5 2443 FLAGS = (SCALAR) 2444 GV = main::c 2445 } 2446 } 2447 2448This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 2449not optimized away (one per number in the left column). The immediate 2450children of the given node correspond to C<{}> pairs on the same level 2451of indentation, thus this listing corresponds to the tree: 2452 2453 add 2454 / \ 2455 null null 2456 | | 2457 gvsv gvsv 2458 2459The execution order is indicated by C<===E<gt>> marks, thus it is C<3 24604 5 6> (node C<6> is not included into above listing), i.e., 2461C<gvsv gvsv add whatever>. 2462 2463Each of these nodes represents an op, a fundamental operation inside the 2464Perl core. The code which implements each operation can be found in the 2465F<pp*.c> files; the function which implements the op with type C<gvsv> 2466is C<pp_gvsv>, and so on. As the tree above shows, different ops have 2467different numbers of children: C<add> is a binary operator, as one would 2468expect, and so has two children. To accommodate the various different 2469numbers of children, there are various types of op data structure, and 2470they link together in different ways. 2471 2472The simplest type of op structure is C<OP>: this has no children. Unary 2473operators, C<UNOP>s, have one child, and this is pointed to by the 2474C<op_first> field. Binary operators (C<BINOP>s) have not only an 2475C<op_first> field but also an C<op_last> field. The most complex type of 2476op is a C<LISTOP>, which has any number of children. In this case, the 2477first child is pointed to by C<op_first> and the last child by 2478C<op_last>. The children in between can be found by iteratively 2479following the C<OpSIBLING> pointer from the first child to the last (but 2480see below). 2481 2482=for apidoc_section $optree_construction 2483=for apidoc Ayh||OP 2484=for apidoc Ayh||BINOP 2485=for apidoc Ayh||LISTOP 2486=for apidoc Ayh||UNOP 2487 2488There are also some other op types: a C<PMOP> holds a regular expression, 2489and has no children, and a C<LOOP> may or may not have children. If the 2490C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 2491complicate matters, if a C<UNOP> is actually a C<null> op after 2492optimization (see L</Compile pass 2: context propagation>) it will still 2493have children in accordance with its former type. 2494 2495=for apidoc Ayh||LOOP 2496=for apidoc Ayh||PMOP 2497 2498Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one 2499or more children, but it doesn't have an C<op_last> field: so you have to 2500follow C<op_first> and then the C<OpSIBLING> chain itself to find the 2501last child. Instead it has an C<op_other> field, which is comparable to 2502the C<op_next> field described below, and represents an alternate 2503execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note 2504that in general, C<op_other> may not point to any of the direct children 2505of the C<LOGOP>. 2506 2507=for apidoc Ayh||LOGOP 2508 2509Starting in version 5.21.2, perls built with the experimental 2510define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, 2511C<op_moresib>. When not set, this indicates that this is the last op in an 2512C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last 2513sibling to point back to the parent op. Under this build, that field is 2514also renamed C<op_sibparent> to reflect its joint role. The macro 2515C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on 2516the last sibling. With this build the C<op_parent(o)> function can be 2517used to find the parent of any op. Thus for forward compatibility, you 2518should always use the C<OpSIBLING(o)> macro rather than accessing 2519C<op_sibling> directly. 2520 2521Another way to examine the tree is to use a compiler back-end module, such 2522as L<B::Concise>. 2523 2524=head2 Compile pass 1: check routines 2525 2526The tree is created by the compiler while I<yacc> code feeds it 2527the constructions it recognizes. Since I<yacc> works bottom-up, so does 2528the first pass of perl compilation. 2529 2530What makes this pass interesting for perl developers is that some 2531optimization may be performed on this pass. This is optimization by 2532so-called "check routines". The correspondence between node names 2533and corresponding check routines is described in F<opcode.pl> (do not 2534forget to run C<make regen_headers> if you modify this file). 2535 2536A check routine is called when the node is fully constructed except 2537for the execution-order thread. Since at this time there are no 2538back-links to the currently constructed node, one can do most any 2539operation to the top-level node, including freeing it and/or creating 2540new nodes above/below it. 2541 2542The check routine returns the node which should be inserted into the 2543tree (if the top-level node was not modified, check routine returns 2544its argument). 2545 2546By convention, check routines have names C<ck_*>. They are usually 2547called from C<new*OP> subroutines (or C<convert>) (which in turn are 2548called from F<perly.y>). 2549 2550=head2 Compile pass 1a: constant folding 2551 2552Immediately after the check routine is called the returned node is 2553checked for being compile-time executable. If it is (the value is 2554judged to be constant) it is immediately executed, and a I<constant> 2555node with the "return value" of the corresponding subtree is 2556substituted instead. The subtree is deleted. 2557 2558If constant folding was not performed, the execution-order thread is 2559created. 2560 2561=head2 Compile pass 2: context propagation 2562 2563When a context for a part of compile tree is known, it is propagated 2564down through the tree. At this time the context can have 5 values 2565(instead of 2 for runtime context): void, boolean, scalar, list, and 2566lvalue. In contrast with the pass 1 this pass is processed from top 2567to bottom: a node's context determines the context for its children. 2568 2569Additional context-dependent optimizations are performed at this time. 2570Since at this moment the compile tree contains back-references (via 2571"thread" pointers), nodes cannot be free()d now. To allow 2572optimized-away nodes at this stage, such nodes are null()ified instead 2573of free()ing (i.e. their type is changed to OP_NULL). 2574 2575=head2 Compile pass 3: peephole optimization 2576 2577After the compile tree for a subroutine (or for an C<eval> or a file) 2578is created, an additional pass over the code is performed. This pass 2579is neither top-down or bottom-up, but in the execution order (with 2580additional complications for conditionals). Optimizations performed 2581at this stage are subject to the same restrictions as in the pass 2. 2582 2583Peephole optimizations are done by calling the function pointed to 2584by the global variable C<PL_peepp>. By default, C<PL_peepp> just 2585calls the function pointed to by the global variable C<PL_rpeepp>. 2586By default, that performs some basic op fixups and optimisations along 2587the execution-order op chain, and recursively calls C<PL_rpeepp> for 2588each side chain of ops (resulting from conditionals). Extensions may 2589provide additional optimisations or fixups, hooking into either the 2590per-subroutine or recursive stage, like this: 2591 2592 static peep_t prev_peepp; 2593 static void my_peep(pTHX_ OP *o) 2594 { 2595 /* custom per-subroutine optimisation goes here */ 2596 prev_peepp(aTHX_ o); 2597 /* custom per-subroutine optimisation may also go here */ 2598 } 2599 BOOT: 2600 prev_peepp = PL_peepp; 2601 PL_peepp = my_peep; 2602 2603 static peep_t prev_rpeepp; 2604 static void my_rpeep(pTHX_ OP *first) 2605 { 2606 OP *o = first, *t = first; 2607 for(; o = o->op_next, t = t->op_next) { 2608 /* custom per-op optimisation goes here */ 2609 o = o->op_next; 2610 if (!o || o == t) break; 2611 /* custom per-op optimisation goes AND here */ 2612 } 2613 prev_rpeepp(aTHX_ orig_o); 2614 } 2615 BOOT: 2616 prev_rpeepp = PL_rpeepp; 2617 PL_rpeepp = my_rpeep; 2618 2619=for apidoc_section $optree_manipulation 2620=for apidoc Ayh||peep_t 2621 2622=head2 Pluggable runops 2623 2624The compile tree is executed in a runops function. There are two runops 2625functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 2626with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 2627control over the execution of the compile tree it is possible to provide 2628your own runops function. 2629 2630It's probably best to copy one of the existing runops functions and 2631change it to suit your needs. Then, in the BOOT section of your XS 2632file, add the line: 2633 2634 PL_runops = my_runops; 2635 2636=for apidoc_section $debugging 2637=for apidoc runops_debug 2638=for apidoc runops_standard 2639=for apidoc Amnh|runops_proc_t|PL_runops 2640 2641This function should be as efficient as possible to keep your programs 2642running as fast as possible. 2643 2644=head2 Compile-time scope hooks 2645 2646As of perl 5.14 it is possible to hook into the compile-time lexical 2647scope mechanism using C<Perl_blockhook_register>. This is used like 2648this: 2649 2650 STATIC void my_start_hook(pTHX_ int full); 2651 STATIC BHK my_hooks; 2652 2653 BOOT: 2654 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); 2655 Perl_blockhook_register(aTHX_ &my_hooks); 2656 2657This will arrange to have C<my_start_hook> called at the start of 2658compiling every lexical scope. The available hooks are: 2659 2660=for apidoc_section $lexer 2661=for apidoc Ayh||BHK 2662 2663=over 4 2664 2665=item C<void bhk_start(pTHX_ int full)> 2666 2667This is called just after starting a new lexical scope. Note that Perl 2668code like 2669 2670 if ($x) { ... } 2671 2672creates two scopes: the first starts at the C<(> and has C<full == 1>, 2673the second starts at the C<{> and has C<full == 0>. Both end at the 2674C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything 2675pushed onto the save stack by this hook will be popped just before the 2676scope ends (between the C<pre_> and C<post_end> hooks, in fact). 2677 2678=item C<void bhk_pre_end(pTHX_ OP **o)> 2679 2680This is called at the end of a lexical scope, just before unwinding the 2681stack. I<o> is the root of the optree representing the scope; it is a 2682double pointer so you can replace the OP if you need to. 2683 2684=item C<void bhk_post_end(pTHX_ OP **o)> 2685 2686This is called at the end of a lexical scope, just after unwinding the 2687stack. I<o> is as above. Note that it is possible for calls to C<pre_> 2688and C<post_end> to nest, if there is something on the save stack that 2689calls string eval. 2690 2691=item C<void bhk_eval(pTHX_ OP *const o)> 2692 2693This is called just before starting to compile an C<eval STRING>, C<do 2694FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the 2695OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, 2696C<OP_DOFILE> or C<OP_REQUIRE>. 2697 2698=back 2699 2700Once you have your hook functions, you need a C<BHK> structure to put 2701them in. It's best to allocate it statically, since there is no way to 2702free it once it's registered. The function pointers should be inserted 2703into this structure using the C<BhkENTRY_set> macro, which will also set 2704flags indicating which entries are valid. If you do need to allocate 2705your C<BHK> dynamically for some reason, be sure to zero it before you 2706start. 2707 2708Once registered, there is no mechanism to switch these hooks off, so if 2709that is necessary you will need to do this yourself. An entry in C<%^H> 2710is probably the best way, so the effect is lexically scoped; however it 2711is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to 2712temporarily switch entries on and off. You should also be aware that 2713generally speaking at least one scope will have opened before your 2714extension is loaded, so you will see some C<pre>/C<post_end> pairs that 2715didn't have a matching C<start>. 2716 2717=head1 Examining internal data structures with the C<dump> functions 2718 2719To aid debugging, the source file F<dump.c> contains a number of 2720functions which produce formatted output of internal data structures. 2721 2722The most commonly used of these functions is C<Perl_sv_dump>; it's used 2723for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 2724C<sv_dump> to produce debugging output from Perl-space, so users of that 2725module should already be familiar with its format. 2726 2727C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 2728derivatives, and produces output similar to C<perl -Dx>; in fact, 2729C<Perl_dump_eval> will dump the main root of the code being evaluated, 2730exactly like C<-Dx>. 2731 2732=for apidoc_section $debugging 2733=for apidoc dump_eval 2734 2735Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 2736op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 2737subroutines in a package like so: (Thankfully, these are all xsubs, so 2738there is no op tree) 2739 2740=for apidoc_section $debugging 2741=for apidoc dump_sub 2742 2743 (gdb) print Perl_dump_packsubs(PL_defstash) 2744 2745 SUB attributes::bootstrap = (xsub 0x811fedc 0) 2746 2747 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 2748 2749 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 2750 2751 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 2752 2753 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 2754 2755and C<Perl_dump_all>, which dumps all the subroutines in the stash and 2756the op tree of the main root. 2757 2758=head1 How multiple interpreters and concurrency are supported 2759 2760=head2 Background and MULTIPLICITY 2761 2762=for apidoc_section $concurrency 2763=for apidoc Amnh||PERL_IMPLICIT_CONTEXT 2764 2765The Perl interpreter can be regarded as a closed box: it has an API 2766for feeding it code or otherwise making it do things, but it also has 2767functions for its own use. This smells a lot like an object, and 2768there is a way for you to build Perl so that you can have multiple 2769interpreters, with one interpreter represented either as a C structure, 2770or inside a thread-specific structure. These structures contain all 2771the context, the state of that interpreter. 2772 2773The macro that controls the major Perl build flavor is MULTIPLICITY. The 2774MULTIPLICITY build has a C structure that packages all the interpreter 2775state, which is being passed to various perl functions as a "hidden" 2776first argument. MULTIPLICITY makes multi-threaded perls possible (with the 2777ithreads threading model, related to the macro USE_ITHREADS.) 2778 2779PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. 2780 2781=for apidoc_section $concurrency 2782=for apidoc Amnh||MULTIPLICITY 2783 2784To see whether you have non-const data you can use a BSD (or GNU) 2785compatible C<nm>: 2786 2787 nm libperl.a | grep -v ' [TURtr] ' 2788 2789If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), 2790you have non-const data. The symbols the C<grep> removed are as follows: 2791C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, 2792and the C<U> is <undefined>, external symbols referred to. 2793 2794The test F<t/porting/libperl.t> does this kind of symbol sanity 2795checking on C<libperl.a>. 2796 2797All this obviously requires a way for the Perl internal functions to be 2798either subroutines taking some kind of structure as the first 2799argument, or subroutines taking nothing as the first argument. To 2800enable these two very different ways of building the interpreter, 2801the Perl source (as it does in so many other situations) makes heavy 2802use of macros and subroutine naming conventions. 2803 2804First problem: deciding which functions will be public API functions and 2805which will be private. All functions whose names begin C<S_> are private 2806(think "S" for "secret" or "static"). All other functions begin with 2807"Perl_", but just because a function begins with "Perl_" does not mean it is 2808part of the API. (See L</Internal 2809Functions>.) The easiest way to be B<sure> a 2810function is part of the API is to find its entry in L<perlapi>. 2811If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 2812think it should be (i.e., you need it for your extension), submit an issue at 2813L<https://github.com/Perl/perl5/issues> explaining why you think it should be. 2814 2815Second problem: there must be a syntax so that the same subroutine 2816declarations and calls can pass a structure as their first argument, 2817or pass nothing. To solve this, the subroutines are named and 2818declared in a particular way. Here's a typical start of a static 2819function used within the Perl guts: 2820 2821 STATIC void 2822 S_incline(pTHX_ char *s) 2823 2824STATIC becomes "static" in C, and may be #define'd to nothing in some 2825configurations in the future. 2826 2827=for apidoc_section $directives 2828=for apidoc Ayh||STATIC 2829 2830A public function (i.e. part of the internal API, but not necessarily 2831sanctioned for use in extensions) begins like this: 2832 2833 void 2834 Perl_sv_setiv(pTHX_ SV* dsv, IV num) 2835 2836C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the 2837details of the interpreter's context. THX stands for "thread", "this", 2838or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 2839The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 2840or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 2841their variants. 2842 2843=for apidoc_section $concurrency 2844=for apidoc Amnh||aTHX 2845=for apidoc Amnh||aTHX_ 2846=for apidoc Amnh||dTHX 2847=for apidoc Amnh||pTHX 2848=for apidoc Amnh||pTHX_ 2849 2850When Perl is built without options that set MULTIPLICITY, there is no 2851first argument containing the interpreter's context. The trailing underscore 2852in the pTHX_ macro indicates that the macro expansion needs a comma 2853after the context argument because other arguments follow it. If 2854MULTIPLICITY is not defined, pTHX_ will be ignored, and the 2855subroutine is not prototyped to take the extra argument. The form of the 2856macro without the trailing underscore is used when there are no additional 2857explicit arguments. 2858 2859When a core function calls another, it must pass the context. This 2860is normally hidden via macros. Consider C<sv_setiv>. It expands into 2861something like this: 2862 2863 #ifdef MULTIPLICITY 2864 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 2865 /* can't do this for vararg functions, see below */ 2866 #else 2867 #define sv_setiv Perl_sv_setiv 2868 #endif 2869 2870This works well, and means that XS authors can gleefully write: 2871 2872 sv_setiv(foo, bar); 2873 2874and still have it work under all the modes Perl could have been 2875compiled with. 2876 2877This doesn't work so cleanly for varargs functions, though, as macros 2878imply that the number of arguments is known in advance. Instead we 2879either need to spell them out fully, passing C<aTHX_> as the first 2880argument (the Perl core tends to do this with functions like 2881Perl_warner), or use a context-free version. 2882 2883The context-free version of Perl_warner is called 2884Perl_warner_nocontext, and does not take the extra argument. Instead 2885it does C<dTHX;> to get the context from thread-local storage. We 2886C<#define warner Perl_warner_nocontext> so that extensions get source 2887compatibility at the expense of performance. (Passing an arg is 2888cheaper than grabbing it from thread-local storage.) 2889 2890You can ignore [pad]THXx when browsing the Perl headers/sources. 2891Those are strictly for use within the core. Extensions and embedders 2892need only be aware of [pad]THX. 2893 2894=head2 So what happened to dTHR? 2895 2896=for apidoc_section $concurrency 2897=for apidoc Amnh||dTHR 2898 2899C<dTHR> was introduced in perl 5.005 to support the older thread model. 2900The older thread model now uses the C<THX> mechanism to pass context 2901pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 2902later still have it for backward source compatibility, but it is defined 2903to be a no-op. 2904 2905=head2 How do I use all this in extensions? 2906 2907When Perl is built with MULTIPLICITY, extensions that call 2908any functions in the Perl API will need to pass the initial context 2909argument somehow. The kicker is that you will need to write it in 2910such a way that the extension still compiles when Perl hasn't been 2911built with MULTIPLICITY enabled. 2912 2913There are three ways to do this. First, the easy but inefficient way, 2914which is also the default, in order to maintain source compatibility 2915with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX 2916and aTHX_ macros to call a function that will return the context. 2917Thus, something like: 2918 2919 sv_setiv(sv, num); 2920 2921in your extension will translate to this when MULTIPLICITY is 2922in effect: 2923 2924 Perl_sv_setiv(Perl_get_context(), sv, num); 2925 2926or to this otherwise: 2927 2928 Perl_sv_setiv(sv, num); 2929 2930You don't have to do anything new in your extension to get this; since 2931the Perl library provides Perl_get_context(), it will all just 2932work. 2933 2934The second, more efficient way is to use the following template for 2935your Foo.xs: 2936 2937 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2938 #include "EXTERN.h" 2939 #include "perl.h" 2940 #include "XSUB.h" 2941 2942 STATIC void my_private_function(int arg1, int arg2); 2943 2944 STATIC void 2945 my_private_function(int arg1, int arg2) 2946 { 2947 dTHX; /* fetch context */ 2948 ... call many Perl API functions ... 2949 } 2950 2951 [... etc ...] 2952 2953 MODULE = Foo PACKAGE = Foo 2954 2955 /* typical XSUB */ 2956 2957 void 2958 my_xsub(arg) 2959 int arg 2960 CODE: 2961 my_private_function(arg, 10); 2962 2963Note that the only two changes from the normal way of writing an 2964extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 2965including the Perl headers, followed by a C<dTHX;> declaration at 2966the start of every function that will call the Perl API. (You'll 2967know which functions need this, because the C compiler will complain 2968that there's an undeclared identifier in those functions.) No changes 2969are needed for the XSUBs themselves, because the XS() macro is 2970correctly defined to pass in the implicit context if needed. 2971 2972=for apidoc_section $concurrency 2973=for apidoc AmnhU||PERL_NO_GET_CONTEXT 2974 2975The third, even more efficient way is to ape how it is done within 2976the Perl guts: 2977 2978 2979 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2980 #include "EXTERN.h" 2981 #include "perl.h" 2982 #include "XSUB.h" 2983 2984 /* pTHX_ only needed for functions that call Perl API */ 2985 STATIC void my_private_function(pTHX_ int arg1, int arg2); 2986 2987 STATIC void 2988 my_private_function(pTHX_ int arg1, int arg2) 2989 { 2990 /* dTHX; not needed here, because THX is an argument */ 2991 ... call Perl API functions ... 2992 } 2993 2994 [... etc ...] 2995 2996 MODULE = Foo PACKAGE = Foo 2997 2998 /* typical XSUB */ 2999 3000 void 3001 my_xsub(arg) 3002 int arg 3003 CODE: 3004 my_private_function(aTHX_ arg, 10); 3005 3006This implementation never has to fetch the context using a function 3007call, since it is always passed as an extra argument. Depending on 3008your needs for simplicity or efficiency, you may mix the previous 3009two approaches freely. 3010 3011Never add a comma after C<pTHX> yourself--always use the form of the 3012macro with the underscore for functions that take explicit arguments, 3013or the form without the argument for functions with no explicit arguments. 3014 3015=head2 Should I do anything special if I call perl from multiple threads? 3016 3017If you create interpreters in one thread and then proceed to call them in 3018another, you need to make sure perl's own Thread Local Storage (TLS) slot is 3019initialized correctly in each of those threads. 3020 3021The C<perl_alloc> and C<perl_clone> API functions will automatically set 3022the TLS slot to the interpreter they created, so that there is no need to do 3023anything special if the interpreter is always accessed in the same thread that 3024created it, and that thread did not create or call any other interpreters 3025afterwards. If that is not the case, you have to set the TLS slot of the 3026thread before calling any functions in the Perl API on that particular 3027interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 3028thread as the first thing you do: 3029 3030 /* do this before doing anything else with some_perl */ 3031 PERL_SET_CONTEXT(some_perl); 3032 3033 ... other Perl API calls on some_perl go here ... 3034 3035=for apidoc_section $embedding 3036=for apidoc Amh|void|PERL_SET_CONTEXT|PerlInterpreter* i 3037 3038(You can always get the current context via C<PERL_GET_CONTEXT>.) 3039 3040=for apidoc Amnh|PerlInterpreter*|PERL_GET_CONTEXT| 3041 3042=head2 Future Plans and PERL_IMPLICIT_SYS 3043 3044Just as MULTIPLICITY provides a way to bundle up everything 3045that the interpreter knows about itself and pass it around, so too are 3046there plans to allow the interpreter to bundle up everything it knows 3047about the environment it's running on. This is enabled with the 3048PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on 3049Windows. 3050 3051This allows the ability to provide an extra pointer (called the "host" 3052environment) for all the system calls. This makes it possible for 3053all the system stuff to maintain their own state, broken down into 3054seven C structures. These are thin wrappers around the usual system 3055calls (see F<win32/perllib.c>) for the default perl executable, but for a 3056more ambitious host (like the one that would do fork() emulation) all 3057the extra work needed to pretend that different interpreters are 3058actually different "processes", would be done here. 3059 3060The Perl engine/interpreter and the host are orthogonal entities. 3061There could be one or more interpreters in a process, and one or 3062more "hosts", with free association between them. 3063 3064=head1 Internal Functions 3065 3066All of Perl's internal functions which will be exposed to the outside 3067world are prefixed by C<Perl_> so that they will not conflict with XS 3068functions or functions used in a program in which Perl is embedded. 3069Similarly, all global variables begin with C<PL_>. (By convention, 3070static functions start with C<S_>.) 3071 3072Inside the Perl core (C<PERL_CORE> defined), you can get at the functions 3073either with or without the C<Perl_> prefix, thanks to a bunch of defines 3074that live in F<embed.h>. Note that extension code should I<not> set 3075C<PERL_CORE>; this exposes the full perl internals, and is likely to cause 3076breakage of the XS in each new perl release. 3077 3078The file F<embed.h> is generated automatically from 3079F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 3080header files for the internal functions, generates the documentation 3081and a lot of other bits and pieces. It's important that when you add 3082a new function to the core or change an existing one, you change the 3083data in the table in F<embed.fnc> as well. Here's a sample entry from 3084that table: 3085 3086 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 3087 3088The first column is a set of flags, the second column the return type, 3089the third column the name. Columns after that are the arguments. 3090The flags are documented at the top of F<embed.fnc>. 3091 3092If you edit F<embed.pl> or F<embed.fnc>, you will need to run 3093C<make regen_headers> to force a rebuild of F<embed.h> and other 3094auto-generated files. 3095 3096=head2 Formatted Printing of IVs, UVs, and NVs 3097 3098If you are printing IVs, UVs, or NVS instead of the stdio(3) style 3099formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 3100following macros for portability 3101 3102 IVdf IV in decimal 3103 UVuf UV in decimal 3104 UVof UV in octal 3105 UVxf UV in hexadecimal 3106 NVef NV %e-like 3107 NVff NV %f-like 3108 NVgf NV %g-like 3109 3110These will take care of 64-bit integers and long doubles. 3111For example: 3112 3113 printf("IV is %" IVdf "\n", iv); 3114 3115The C<IVdf> will expand to whatever is the correct format for the IVs. 3116Note that the spaces are required around the format in case the code is 3117compiled with C++, to maintain compliance with its standard. 3118 3119Note that there are different "long doubles": Perl will use 3120whatever the compiler has. 3121 3122If you are printing addresses of pointers, use %p or UVxf combined 3123with PTR2UV(). 3124 3125=head2 Formatted Printing of SVs 3126 3127The contents of SVs may be printed using the C<SVf> format, like so: 3128 3129 Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg)) 3130 3131where C<err_msg> is an SV. 3132 3133=for apidoc_section $io_formats 3134=for apidoc Amnh||SVf 3135=for apidoc Amh||SVfARG|SV *sv 3136 3137Not all scalar types are printable. Simple values certainly are: one of 3138IV, UV, NV, or PV. Also, if the SV is a reference to some value, 3139either it will be dereferenced and the value printed, or information 3140about the type of that value and its address are displayed. The results 3141of printing any other type of SV are undefined and likely to lead to an 3142interpreter crash. NVs are printed using a C<%g>-ish format. 3143 3144Note that the spaces are required around the C<SVf> in case the code is 3145compiled with C++, to maintain compliance with its standard. 3146 3147Note that any filehandle being printed to under UTF-8 must be expecting 3148UTF-8 in order to get good results and avoid Wide-character warnings. 3149One way to do this for typical filehandles is to invoke perl with the 3150C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. 3151 3152You can use this to concatenate two scalars: 3153 3154 SV *var1 = get_sv("var1", GV_ADD); 3155 SV *var2 = get_sv("var2", GV_ADD); 3156 SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, 3157 SVfARG(var1), SVfARG(var2)); 3158 3159=for apidoc Amnh||SVf_QUOTEDPREFIX 3160 3161C<SVf_QUOTEDPREFIX> is similar to C<SVf> except that it restricts the 3162number of the characters printed, showing at most the first 3163C<PERL_QUOTEDPREFIX_LEN> characters of the argument, and rendering it with 3164double quotes and with the contents escaped using double quoted string 3165escaping rules. If the string is longer than this then ellipses "..." 3166will be appended after the trailing quote. This is intended for error 3167messages where the string is assumed to be a class name. 3168 3169=for apidoc Amnh||HvNAMEf 3170=for apidoc Amnh||HvNAMEf_QUOTEDPREFIX 3171 3172C<HvNAMEf> and C<HvNAMEf_QUOTEDPREFIX> are similar to C<SVf> except they 3173extract the string, length and utf8 flags from the argument using the 3174C<HvNAME()>, C<HvNAMELEN()>, C<HvNAMEUTF8()> macros. This is intended 3175for stringifying a class name directly from an stash HV. 3176 3177=head2 Formatted Printing of Strings 3178 3179If you just want the bytes printed in a 7bit NUL-terminated string, you can 3180just use C<%s> (assuming they are all really only 7bit). But if there is a 3181possibility the value will be encoded as UTF-8 or contains bytes above 3182C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. 3183And as its parameter, use the C<UTF8fARG()> macro: 3184 3185 chr * msg; 3186 3187 /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK 3188 U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ 3189 if (can_utf8) 3190 msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; 3191 else 3192 msg = "'Uses simple quotes'"; 3193 3194 Perl_croak(aTHX_ "The message is: %" UTF8f "\n", 3195 UTF8fARG(can_utf8, strlen(msg), msg)); 3196 3197The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in 3198UTF-8; 0 if string is in native byte encoding (Latin1). 3199The second parameter is the number of bytes in the string to print. 3200And the third and final parameter is a pointer to the first byte in the 3201string. 3202 3203Note that any filehandle being printed to under UTF-8 must be expecting 3204UTF-8 in order to get good results and avoid Wide-character warnings. 3205One way to do this for typical filehandles is to invoke perl with the 3206C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. 3207 3208=for apidoc_section $io_formats 3209=for apidoc Amnh||UTF8f 3210Output a possibly UTF8 value. Be sure to use UTF8fARG() to compose 3211the arguments for this format. 3212=for apidoc Amnh||UTF8f_QUOTEDPREFIX 3213Same as C<UTF8f> but the output is quoted, escaped and length limited. 3214See C<SVf_QUOTEDPREFIX> for more details on escaping. 3215=for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str 3216 3217=cut 3218 3219=head2 Formatted Printing of C<Size_t> and C<SSize_t> 3220 3221The most general way to do this is to cast them to a UV or IV, and 3222print as in the 3223L<previous section|/Formatted Printing of IVs, UVs, and NVs>. 3224 3225But if you're using C<PerlIO_printf()>, it's less typing and visual 3226clutter to use the C<%z> length modifier (for I<siZe>): 3227 3228 PerlIO_printf("STRLEN is %zu\n", len); 3229 3230This modifier is not portable, so its use should be restricted to 3231C<PerlIO_printf()>. 3232 3233=head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes 3234 3235There are modifiers for these special situations if you are using 3236C<PerlIO_printf()>. See L<perlfunc/size>. 3237 3238=head2 Pointer-To-Integer and Integer-To-Pointer 3239 3240Because pointer size does not necessarily equal integer size, 3241use the follow macros to do it right. 3242 3243 PTR2UV(pointer) 3244 PTR2IV(pointer) 3245 PTR2NV(pointer) 3246 INT2PTR(pointertotype, integer) 3247 3248=for apidoc_section $casting 3249=for apidoc Amh|type|INT2PTR|type|int value 3250=for apidoc Amh|UV|PTR2UV|void * ptr 3251=for apidoc Amh|IV|PTR2IV|void * ptr 3252=for apidoc Amh|NV|PTR2NV|void * ptr 3253 3254For example: 3255 3256 IV iv = ...; 3257 SV *sv = INT2PTR(SV*, iv); 3258 3259and 3260 3261 AV *av = ...; 3262 UV uv = PTR2UV(av); 3263 3264There are also 3265 3266 PTR2nat(pointer) /* pointer to integer of PTRSIZE */ 3267 PTR2ul(pointer) /* pointer to unsigned long */ 3268 3269=for apidoc Amh|IV|PTR2nat|void * 3270=for apidoc Amh|unsigned long|PTR2ul|void * 3271 3272And C<PTRV> which gives the native type for an integer the same size as 3273pointers, such as C<unsigned> or C<unsigned long>. 3274 3275=for apidoc Ayh|type|PTRV 3276 3277=head2 Exception Handling 3278 3279There are a couple of macros to do very basic exception handling in XS 3280modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to 3281be able to use these macros: 3282 3283 #define NO_XSLOCKS 3284 #include "XSUB.h" 3285 3286You can use these macros if you call code that may croak, but you need 3287to do some cleanup before giving control back to Perl. For example: 3288 3289 dXCPT; /* set up necessary variables */ 3290 3291 XCPT_TRY_START { 3292 code_that_may_croak(); 3293 } XCPT_TRY_END 3294 3295 XCPT_CATCH 3296 { 3297 /* do cleanup here */ 3298 XCPT_RETHROW; 3299 } 3300 3301Note that you always have to rethrow an exception that has been 3302caught. Using these macros, it is not possible to just catch the 3303exception and ignore it. If you have to ignore the exception, you 3304have to use the C<call_*> function. 3305 3306The advantage of using the above macros is that you don't have 3307to setup an extra function for C<call_*>, and that using these 3308macros is faster than using C<call_*>. 3309 3310=head2 Source Documentation 3311 3312There's an effort going on to document the internal functions and 3313automatically produce reference manuals from them -- L<perlapi> is one 3314such manual which details all the functions which are available to XS 3315writers. L<perlintern> is the autogenerated manual for the functions 3316which are not part of the API and are supposedly for internal use only. 3317 3318Source documentation is created by putting POD comments into the C 3319source, like this: 3320 3321 /* 3322 =for apidoc sv_setiv 3323 3324 Copies an integer into the given SV. Does not handle 'set' magic. See 3325 L<perlapi/sv_setiv_mg>. 3326 3327 =cut 3328 */ 3329 3330Please try and supply some documentation if you add functions to the 3331Perl core. 3332 3333=head2 Backwards compatibility 3334 3335The Perl API changes over time. New functions are 3336added or the interfaces of existing functions are 3337changed. The C<Devel::PPPort> module tries to 3338provide compatibility code for some of these changes, so XS writers don't 3339have to code it themselves when supporting multiple versions of Perl. 3340 3341C<Devel::PPPort> generates a C header file F<ppport.h> that can also 3342be run as a Perl script. To generate F<ppport.h>, run: 3343 3344 perl -MDevel::PPPort -eDevel::PPPort::WriteFile 3345 3346Besides checking existing XS code, the script can also be used to retrieve 3347compatibility information for various API calls using the C<--api-info> 3348command line switch. For example: 3349 3350 % perl ppport.h --api-info=sv_magicext 3351 3352For details, see S<C<perldoc ppport.h>>. 3353 3354=head1 Unicode Support 3355 3356Perl 5.6.0 introduced Unicode support. It's important for porters and XS 3357writers to understand this support and make sure that the code they 3358write does not corrupt Unicode data. 3359 3360=head2 What B<is> Unicode, anyway? 3361 3362In the olden, less enlightened times, we all used to use ASCII. Most of 3363us did, anyway. The big problem with ASCII is that it's American. Well, 3364no, that's not actually the problem; the problem is that it's not 3365particularly useful for people who don't use the Roman alphabet. What 3366used to happen was that particular languages would stick their own 3367alphabet in the upper range of the sequence, between 128 and 255. Of 3368course, we then ended up with plenty of variants that weren't quite 3369ASCII, and the whole point of it being a standard was lost. 3370 3371Worse still, if you've got a language like Chinese or 3372Japanese that has hundreds or thousands of characters, then you really 3373can't fit them into a mere 256, so they had to forget about ASCII 3374altogether, and build their own systems using pairs of numbers to refer 3375to one character. 3376 3377To fix this, some people formed Unicode, Inc. and 3378produced a new character set containing all the characters you can 3379possibly think of and more. There are several ways of representing these 3380characters, and the one Perl uses is called UTF-8. UTF-8 uses 3381a variable number of bytes to represent a character. You can learn more 3382about Unicode and Perl's Unicode model in L<perlunicode>. 3383 3384(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of 3385UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. 3386UTF-EBCDIC is like UTF-8, but the details are different. The macros 3387hide the differences from you, just remember that the particular numbers 3388and bit patterns presented below will differ in UTF-EBCDIC.) 3389 3390=head2 How can I recognise a UTF-8 string? 3391 3392You can't. This is because UTF-8 data is stored in bytes just like 3393non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 3394capital E with a grave accent, is represented by the two bytes 3395C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 3396has that byte sequence as well. So you can't tell just by looking -- this 3397is what makes Unicode input an interesting problem. 3398 3399In general, you either have to know what you're dealing with, or you 3400have to guess. The API function C<is_utf8_string> can help; it'll tell 3401you if a string contains only valid UTF-8 characters, and the chances 3402of a non-UTF-8 string looking like valid UTF-8 become very small very 3403quickly with increasing string length. On a character-by-character 3404basis, C<isUTF8_CHAR> 3405will tell you whether the current character in a string is valid UTF-8. 3406 3407=head2 How does UTF-8 represent Unicode characters? 3408 3409As mentioned above, UTF-8 uses a variable number of bytes to store a 3410character. Characters with values 0...127 are stored in one 3411byte, just like good ol' ASCII. Character 128 is stored as 3412C<v194.128>; this continues up to character 191, which is 3413C<v194.191>. Now we've run out of bits (191 is binary 3414C<10111111>) so we move on; character 192 is C<v195.128>. And 3415so it goes on, moving to three bytes at character 2048. 3416L<perlunicode/Unicode Encodings> has pictures of how this works. 3417 3418Assuming you know you're dealing with a UTF-8 string, you can find out 3419how long the first character in it is with the C<UTF8SKIP> macro: 3420 3421 char *utf = "\305\233\340\240\201"; 3422 I32 len; 3423 3424 len = UTF8SKIP(utf); /* len is 2 here */ 3425 utf += len; 3426 len = UTF8SKIP(utf); /* len is 3 here */ 3427 3428Another way to skip over characters in a UTF-8 string is to use 3429C<utf8_hop>, which takes a string and a number of characters to skip 3430over. You're on your own about bounds checking, though, so don't use it 3431lightly. 3432 3433All bytes in a multi-byte UTF-8 character will have the high bit set, 3434so you can test if you need to do something special with this 3435character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests 3436whether the byte is encoded as a single byte even in UTF-8): 3437 3438 U8 *utf; /* Initialize this to point to the beginning of the 3439 sequence to convert */ 3440 U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence 3441 pointed to by 'utf' */ 3442 UV uv; /* Returned code point; note: a UV, not a U8, not a 3443 char */ 3444 STRLEN len; /* Returned length of character in bytes */ 3445 3446 if (!UTF8_IS_INVARIANT(*utf)) 3447 /* Must treat this as UTF-8 */ 3448 uv = utf8_to_uvchr_buf(utf, utf_end, &len); 3449 else 3450 /* OK to treat this character as a byte */ 3451 uv = *utf; 3452 3453You can also see in that example that we use C<utf8_to_uvchr_buf> to get the 3454value of the character; the inverse function C<uvchr_to_utf8> is available 3455for putting a UV into UTF-8: 3456 3457 if (!UVCHR_IS_INVARIANT(uv)) 3458 /* Must treat this as UTF8 */ 3459 utf8 = uvchr_to_utf8(utf8, uv); 3460 else 3461 /* OK to treat this character as a byte */ 3462 *utf8++ = uv; 3463 3464You B<must> convert characters to UVs using the above functions if 3465you're ever in a situation where you have to match UTF-8 and non-UTF-8 3466characters. You may not skip over UTF-8 characters in this case. If you 3467do this, you'll lose the ability to match hi-bit non-UTF-8 characters; 3468for instance, if your UTF-8 string contains C<v196.172>, and you skip 3469that character, you can never match a C<chr(200)> in a non-UTF-8 string. 3470So don't do that! 3471 3472(Note that we don't have to test for invariant characters in the 3473examples above. The functions work on any well-formed UTF-8 input. 3474It's just that its faster to avoid the function overhead when it's not 3475needed.) 3476 3477=head2 How does Perl store UTF-8 strings? 3478 3479Currently, Perl deals with UTF-8 strings and non-UTF-8 strings 3480slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the 3481string is internally encoded as UTF-8. Without it, the byte value is the 3482codepoint number and vice versa. This flag is only meaningful if the SV 3483is C<SvPOK> or immediately after stringification via C<SvPV> or a 3484similar macro. You can check and manipulate this flag with the 3485following macros: 3486 3487 SvUTF8(sv) 3488 SvUTF8_on(sv) 3489 SvUTF8_off(sv) 3490 3491This flag has an important effect on Perl's treatment of the string: if 3492UTF-8 data is not properly distinguished, regular expressions, 3493C<length>, C<substr> and other string handling operations will have 3494undesirable (wrong) results. 3495 3496The problem comes when you have, for instance, a string that isn't 3497flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- 3498especially when combining non-UTF-8 and UTF-8 strings. 3499 3500Never forget that the C<SVf_UTF8> flag is separate from the PV value; you 3501need to be sure you don't accidentally knock it off while you're 3502manipulating SVs. More specifically, you cannot expect to do this: 3503 3504 SV *sv; 3505 SV *nsv; 3506 STRLEN len; 3507 char *p; 3508 3509 p = SvPV(sv, len); 3510 frobnicate(p); 3511 nsv = newSVpvn(p, len); 3512 3513The C<char*> string does not tell you the whole story, and you can't 3514copy or reconstruct an SV just by copying the string value. Check if the 3515old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act 3516accordingly: 3517 3518 p = SvPV(sv, len); 3519 is_utf8 = SvUTF8(sv); 3520 frobnicate(p, is_utf8); 3521 nsv = newSVpvn(p, len); 3522 if (is_utf8) 3523 SvUTF8_on(nsv); 3524 3525In the above, your C<frobnicate> function has been changed to be made 3526aware of whether or not it's dealing with UTF-8 data, so that it can 3527handle the string appropriately. 3528 3529Since just passing an SV to an XS function and copying the data of 3530the SV is not enough to copy the UTF8 flags, even less right is just 3531passing a S<C<char *>> to an XS function. 3532 3533For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the 3534string in an SV is to be I<treated> as UTF-8. This takes into account 3535if the call to the XS function is being made from within the scope of 3536L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the 3537UTF-8 string are to be exposed, rather than the character they 3538represent. But this pragma should only really be used for debugging and 3539perhaps low-level testing at the byte level. Hence most XS code need 3540not concern itself with this, but various areas of the perl core do need 3541to support it. 3542 3543And this isn't the whole story. Starting in Perl v5.12, strings that 3544aren't encoded in UTF-8 may also be treated as Unicode under various 3545conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). 3546This is only really a problem for characters whose ordinals are between 3547128 and 255, and their behavior varies under ASCII versus Unicode rules 3548in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). 3549There is no published API for dealing with this, as it is subject to 3550change, but you can look at the code for C<pp_lc> in F<pp.c> for an 3551example as to how it's currently done. 3552 3553=head2 How do I pass a Perl string to a C library? 3554 3555A Perl string, conceptually, is an opaque sequence of code points. 3556Many C libraries expect their inputs to be "classical" C strings, which are 3557arrays of octets 1-255, terminated with a NUL byte. Your job when writing 3558an interface between Perl and a C library is to define the mapping between 3559Perl and that library. 3560 3561Generally speaking, C<SvPVbyte> and related macros suit this task well. 3562These assume that your Perl string is a "byte string", i.e., is either 3563raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8. 3564 3565Alternatively, if your C library expects UTF-8 text, you can use 3566C<SvPVutf8> and related macros. This has the same effect as encoding 3567to UTF-8 then calling the corresponding C<SvPVbyte>-related macro. 3568 3569Some C libraries may expect other encodings (e.g., UTF-16LE). To give 3570Perl strings to such libraries 3571you must either do that encoding in Perl then use C<SvPVbyte>, or 3572use an intermediary C library to convert from however Perl stores the 3573string to the desired encoding. 3574 3575Take care also that NULs in your Perl string don't confuse the C 3576library. If possible, give the string's length to the C library; if that's 3577not possible, consider rejecting strings that contain NUL bytes. 3578 3579=head3 What about C<SvPV>, C<SvPV_nolen>, etc.? 3580 3581Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">. 3582Perl can store these 3 characters either of two ways: 3583 3584=over 3585 3586=item * bytes: 0x64 0x78 0x8c 3587 3588=item * UTF-8: 0x64 0x78 0xc2 0x8c 3589 3590=back 3591 3592Now let's say you convert C<$foo> to a C string thus: 3593 3594 STRLEN strlen; 3595 char *str = SvPV(foo_sv, strlen); 3596 3597At this point C<str> could point to a 3-byte C string or a 4-byte one. 3598 3599Generally speaking, we want C<str> to be the same regardless of how 3600Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte> 3601and C<SvPVutf8> solve that by giving predictable output: use 3602C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8> 3603if it expects UTF-8. 3604 3605If your C library happens to support both encodings, then C<SvPV>--always 3606in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more 3607efficient. 3608 3609B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions 3610in your tests to ensure consistent handling regardless of Perl's 3611internal encoding. 3612 3613=head2 How do I convert a string to UTF-8? 3614 3615If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade 3616the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do 3617this is: 3618 3619 sv_utf8_upgrade(sv); 3620 3621However, you must not do this, for example: 3622 3623 if (!SvUTF8(left)) 3624 sv_utf8_upgrade(left); 3625 3626If you do this in a binary operator, you will actually change one of the 3627strings that came into the operator, and, while it shouldn't be noticeable 3628by the end user, it can cause problems in deficient code. 3629 3630Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 3631string argument. This is useful for having the data available for 3632comparisons and so on, without harming the original SV. There's also 3633C<utf8_to_bytes> to go the other way, but naturally, this will fail if 3634the string contains any characters above 255 that can't be represented 3635in a single byte. 3636 3637=head2 How do I compare strings? 3638 3639L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic 3640comparison of two SV's, and handle UTF-8ness properly. Note, however, 3641that Unicode specifies a much fancier mechanism for collation, available 3642via the L<Unicode::Collate> module. 3643 3644To just compare two strings for equality/non-equality, you can just use 3645L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, 3646except the strings must be both UTF-8 or not UTF-8 encoded. 3647 3648To compare two strings case-insensitively, use 3649L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have 3650the same UTF-8ness). 3651 3652=head2 Is there anything else I need to know? 3653 3654Not really. Just remember these things: 3655 3656=over 3 3657 3658=item * 3659 3660There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 3661or not. But you can tell if an SV is to be treated as UTF-8 by calling 3662C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar 3663macro. And, you can tell if SV is actually UTF-8 (even if it is not to 3664be treated as such) by looking at its C<SvUTF8> flag (again after 3665stringifying it). Don't forget to set the flag if something should be 3666UTF-8. 3667Treat the flag as part of the PV, even though it's not -- if you pass on 3668the PV to somewhere, pass on the flag too. 3669 3670=item * 3671 3672If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, 3673unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 3674 3675=item * 3676 3677When writing a character UV to a UTF-8 string, B<always> use 3678C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case 3679you can use C<*s = uv>. 3680 3681=item * 3682 3683Mixing UTF-8 and non-UTF-8 strings is 3684tricky. Use C<bytes_to_utf8> to get 3685a new string which is UTF-8 encoded, and then combine them. 3686 3687=back 3688 3689=head1 Custom Operators 3690 3691Custom operator support is an experimental feature that allows you to 3692define your own ops. This is primarily to allow the building of 3693interpreters for other languages in the Perl core, but it also allows 3694optimizations through the creation of "macro-ops" (ops which perform the 3695functions of multiple ops which are usually executed together, such as 3696C<gvsv, gvsv, add>.) 3697 3698This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 3699core does not "know" anything special about this op type, and so it will 3700not be involved in any optimizations. This also means that you can 3701define your custom ops to be any op structure -- unary, binary, list and 3702so on -- you like. 3703 3704It's important to know what custom operators won't do for you. They 3705won't let you add new syntax to Perl, directly. They won't even let you 3706add new keywords, directly. In fact, they won't change the way Perl 3707compiles a program at all. You have to do those changes yourself, after 3708Perl has compiled the program. You do this either by manipulating the op 3709tree using a C<CHECK> block and the C<B::Generate> module, or by adding 3710a custom peephole optimizer with the C<optimize> module. 3711 3712When you do this, you replace ordinary Perl ops with custom ops by 3713creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own 3714PP function. This should be defined in XS code, and should look like 3715the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 3716takes the appropriate number of values from the stack, and you are 3717responsible for adding stack marks if necessary. 3718 3719You should also "register" your op with the Perl interpreter so that it 3720can produce sensible error and warning messages. Since it is possible to 3721have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 3722Perl uses the value of C<< o->op_ppaddr >> to determine which custom op 3723it is dealing with. You should create an C<XOP> structure for each 3724ppaddr you use, set the properties of the custom op with 3725C<XopENTRY_set>, and register the structure against the ppaddr using 3726C<Perl_custom_op_register>. A trivial example might look like: 3727 3728=for apidoc_section $optree_manipulation 3729=for apidoc Ayh||XOP 3730 3731 static XOP my_xop; 3732 static OP *my_pp(pTHX); 3733 3734 BOOT: 3735 XopENTRY_set(&my_xop, xop_name, "myxop"); 3736 XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); 3737 Perl_custom_op_register(aTHX_ my_pp, &my_xop); 3738 3739The available fields in the structure are: 3740 3741=over 4 3742 3743=item xop_name 3744 3745A short name for your op. This will be included in some error messages, 3746and will also be returned as C<< $op->name >> by the L<B|B> module, so 3747it will appear in the output of module like L<B::Concise|B::Concise>. 3748 3749=item xop_desc 3750 3751A short description of the function of the op. 3752 3753=item xop_class 3754 3755Which of the various C<*OP> structures this op uses. This should be one of 3756the C<OA_*> constants from F<op.h>, namely 3757 3758=over 4 3759 3760=item OA_BASEOP 3761 3762=item OA_UNOP 3763 3764=item OA_BINOP 3765 3766=item OA_LOGOP 3767 3768=item OA_LISTOP 3769 3770=item OA_PMOP 3771 3772=item OA_SVOP 3773 3774=item OA_PADOP 3775 3776=item OA_PVOP_OR_SVOP 3777 3778This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because 3779the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. 3780 3781=item OA_LOOP 3782 3783=item OA_COP 3784 3785=for apidoc_section $optree_manipulation 3786=for apidoc Amnh||OA_BASEOP 3787=for apidoc_item OA_BINOP 3788=for apidoc_item OA_COP 3789=for apidoc_item OA_LISTOP 3790=for apidoc_item OA_LOGOP 3791=for apidoc_item OA_LOOP 3792=for apidoc_item OA_PADOP 3793=for apidoc_item OA_PMOP 3794=for apidoc_item OA_PVOP_OR_SVOP 3795=for apidoc_item OA_SVOP 3796=for apidoc_item OA_UNOP 3797 3798=back 3799 3800The other C<OA_*> constants should not be used. 3801 3802=item xop_peep 3803 3804This member is of type C<Perl_cpeep_t>, which expands to C<void 3805(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function 3806will be called from C<Perl_rpeep> when ops of this type are encountered 3807by the peephole optimizer. I<o> is the OP that needs optimizing; 3808I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. 3809 3810=for apidoc_section $optree_manipulation 3811=for apidoc Ayh||Perl_cpeep_t 3812 3813=back 3814 3815C<B::Generate> directly supports the creation of custom ops by name. 3816 3817=head1 Stacks 3818 3819Descriptions above occasionally refer to "the stack", but there are in fact 3820many stack-like data structures within the perl interpreter. When otherwise 3821unqualified, "the stack" usually refers to the value stack. 3822 3823The various stacks have different purposes, and operate in slightly different 3824ways. Their differences are noted below. 3825 3826=head2 Value Stack 3827 3828This stack stores the values that regular perl code is operating on, usually 3829intermediate values of expressions within a statement. The stack itself is 3830formed of an array of SV pointers. 3831 3832The base of this stack is pointed to by the interpreter variable 3833C<PL_stack_base>, of type C<SV **>. 3834 3835=for apidoc_section $stack 3836=for apidoc Amnh||PL_stack_base 3837 3838The head of the stack is C<PL_stack_sp>, and points to the most 3839recently-pushed item. 3840 3841=for apidoc Amnh||PL_stack_sp 3842 3843Items are pushed to the stack by using the C<PUSHs()> macro or its variants 3844described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed 3845versions. Note carefully that the non-C<X> versions of these macros do not 3846check the size of the stack and assume it to be big enough. These must be 3847paired with a suitable check of the stack's size, such as the C<EXTEND> macro 3848to ensure it is large enough. For example 3849 3850 EXTEND(SP, 4); 3851 mPUSHi(10); 3852 mPUSHi(20); 3853 mPUSHi(30); 3854 mPUSHi(40); 3855 3856This is slightly more performant than making four separate checks in four 3857separate C<mXPUSHi()> calls. 3858 3859As a further performance optimisation, the various C<PUSH> macros all operate 3860using a local variable C<SP>, rather than the interpreter-global variable 3861C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is 3862normally implied by XSUBs and similar so it is rare you have to consider it 3863directly. Once declared, the C<PUSH> macros will operate only on this local 3864variable, so before invoking any other perl core functions you must use the 3865C<PUTBACK> macro to return the value from the local C<SP> variable back to 3866the interpreter variable. Similarly, after calling a perl core function which 3867may have had reason to move the stack or push/pop values to it, you must use 3868the C<SPAGAIN> macro which refreshes the local C<SP> value back from the 3869interpreter one. 3870 3871Items are popped from the stack by using the C<POPs> macro or its typed 3872versions, There is also a macro C<TOPs> that inspects the topmost item without 3873removing it. 3874 3875=for apidoc_section $stack 3876=for apidoc Amnh||TOPs 3877 3878Note specifically that SV pointers on the value stack do not contribute to the 3879overall reference count of the xVs being referred to. If newly-created xVs are 3880being pushed to the stack you must arrange for them to be destroyed at a 3881suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> 3882to mortalise the xV. 3883 3884=head2 Mark Stack 3885 3886The value stack stores individual perl scalar values as temporaries between 3887expressions. Some perl expressions operate on entire lists; for that purpose 3888we need to know where on the stack each list begins. This is the purpose of the 3889mark stack. 3890 3891The mark stack stores integers as I32 values, which are the height of the 3892value stack at the time before the list began; thus the mark itself actually 3893points to the value stack entry one before the list. The list itself starts at 3894C<mark + 1>. 3895 3896The base of this stack is pointed to by the interpreter variable 3897C<PL_markstack>, of type C<I32 *>. 3898 3899=for apidoc_section $stack 3900=for apidoc Amnh||PL_markstack 3901 3902The head of the stack is C<PL_markstack_ptr>, and points to the most 3903recently-pushed item. 3904 3905=for apidoc Amnh||PL_markstack_ptr 3906 3907Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though 3908the stack itself stores (value) stack indices as integers, the C<PUSHMARK> 3909macro should be given a stack pointer directly; it will calculate the index 3910offset by comparing to the C<PL_stack_sp> variable. Thus almost always the 3911code to perform this is 3912 3913 PUSHMARK(SP); 3914 3915Items are popped from the stack by the C<POPMARK> macro. There is also a macro 3916C<TOPMARK> that inspects the topmost item without removing it. These macros 3917return I32 index values directly. There is also the C<dMARK> macro which 3918declares a new SV double-pointer variable, called C<mark>, which points at the 3919marked stack slot; this is the usual macro that C code will use when operating 3920on lists given on the stack. 3921 3922As noted above, the C<mark> variable itself will point at the most recently 3923pushed value on the value stack before the list begins, and so the list itself 3924starts at C<mark + 1>. The values of the list may be iterated by code such as 3925 3926 for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { 3927 SV *item = *svp; 3928 ... 3929 } 3930 3931Note specifically in the case that the list is already empty, C<mark> will 3932equal C<PL_stack_sp>. 3933 3934Because the C<mark> variable is converted to a pointer on the value stack, 3935extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are 3936invoked within the function, because the stack may need to be moved to 3937extend it and so the existing pointer will now be invalid. If this may be a 3938problem, a possible solution is to track the mark offset as an integer and 3939track the mark itself later on after the stack had been moved. 3940 3941 I32 markoff = POPMARK; 3942 3943 ... 3944 3945 SP **mark = PL_stack_base + markoff; 3946 3947=head2 Temporaries Stack 3948 3949As noted above, xV references on the main value stack do not contribute to the 3950reference count of an xV, and so another mechanism is used to track when 3951temporary values which live on the stack must be released. This is the job of 3952the temporaries stack. 3953 3954The temporaries stack stores pointers to xVs whose reference counts will be 3955decremented soon. 3956 3957The base of this stack is pointed to by the interpreter variable 3958C<PL_tmps_stack>, of type C<SV **>. 3959 3960=for apidoc_section $stack 3961=for apidoc Amnh||PL_tmps_stack 3962 3963The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the 3964index in the array of the most recently-pushed item. 3965 3966=for apidoc Amnh||PL_tmps_ix 3967 3968There is no public API to directly push items to the temporaries stack. Instead, 3969the API function C<sv_2mortal()> is used to mortalize an xV, adding its 3970address to the temporaries stack. 3971 3972Likewise, there is no public API to read values from the temporaries stack. 3973Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS> 3974macro establishes the base levels of the temporaries stack, by capturing the 3975current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous 3976value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of 3977the temporaries that have been pushed since that level are reclaimed. 3978 3979=for apidoc_section $stack 3980=for apidoc Amnh||PL_tmps_floor 3981 3982While it is common to see these two macros in pairs within an C<ENTER>/ 3983C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke 3984C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a 3985loop iterating over elements of a list. While you can invoke C<SAVETMPS> 3986multiple times within a scope pair, it is unlikely to be useful. Subsequent 3987invocations will move the temporaries floor further up, thus effectively 3988trapping the existing temporaries to only be released at the end of the scope. 3989 3990=head2 Save Stack 3991 3992The save stack is used by perl to implement the C<local> keyword and other 3993similar behaviours; any cleanup operations that need to be performed when 3994leaving the current scope. Items pushed to this stack generally capture the 3995current value of some internal variable or state, which will be restored when 3996the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other 3997reasons. 3998 3999Whereas other perl internal stacks store individual items all of the same type 4000(usually SV pointers or integers), the items pushed to the save stack are 4001formed of many different types, having multiple fields to them. For example, 4002the C<SAVEt_INT> type needs to store both the address of the C<int> variable 4003to restore, and the value to restore it to. This information could have been 4004stored using fields of a C<struct>, but would have to be large enough to store 4005three pointers in the largest case, which would waste a lot of space in most 4006of the smaller cases. 4007 4008=for apidoc_section $stack 4009=for apidoc Amnh||SAVEt_INT 4010 4011Instead, the stack stores information in a variable-length encoding of C<ANY> 4012structures. The final value pushed is stored in the C<UV> field which encodes 4013the kind of item held by the preceding items; the count and types of which 4014will depend on what kind of item is being stored. The kind field is pushed 4015last because that will be the first field to be popped when unwinding items 4016from the stack. 4017 4018The base of this stack is pointed to by the interpreter variable 4019C<PL_savestack>, of type C<ANY *>. 4020 4021=for apidoc_section $stack 4022=for apidoc Amnh||PL_savestack 4023 4024The head of the stack is indexed by C<PL_savestack_ix>, an integer which 4025stores the index in the array at which the next item should be pushed. (Note 4026that this is different to most other stacks, which reference the most 4027recently-pushed item). 4028 4029=for apidoc_section $stack 4030=for apidoc Amnh||PL_savestack_ix 4031 4032Items are pushed to the save stack by using the various C<SAVE...()> macros. 4033Many of these macros take a variable and store both its address and current 4034value on the save stack, ensuring that value gets restored on scope exit. 4035 4036 SAVEI8(i8) 4037 SAVEI16(i16) 4038 SAVEI32(i32) 4039 SAVEINT(i) 4040 ... 4041 4042There are also a variety of other special-purpose macros which save particular 4043types or values of interest. C<SAVETMPS> has already been mentioned above. 4044Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to 4045be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to 4046be invoked on scope exit. A full list of such macros can be found in 4047F<scope.h>. 4048 4049There is no public API for popping individual values or items from the save 4050stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way 4051to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will 4052restore all of the saved values that had been pushed since the most recent 4053C<ENTER>. 4054 4055=head2 Scope Stack 4056 4057As with the mark stack to the value stack, the scope stack forms a pair with 4058the save stack. The scope stack stores the height of the save stack at which 4059nested scopes begin, and allows the save stack to be unwound back to that 4060point when the scope is left. 4061 4062When perl is built with debugging enabled, there is a second part to this 4063stack storing human-readable string names describing the type of stack 4064context. Each push operation saves the name as well as the height of the save 4065stack, and each pop operation checks the topmost name with what is expected, 4066causing an assertion failure if the name does not match. 4067 4068The base of this stack is pointed to by the interpreter variable 4069C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are 4070stored in a separate array pointed to by C<PL_scopestack_name>, of type 4071C<const char **>. 4072 4073=for apidoc_section $stack 4074=for apidoc Amnh||PL_scopestack 4075=for apidoc Amnh||PL_scopestack_name 4076 4077The head of the stack is indexed by C<PL_scopestack_ix>, an integer which 4078stores the index of the array or arrays at which the next item should be 4079pushed. (Note that this is different to most other stacks, which reference the 4080most recently-pushed item). 4081 4082=for apidoc_section $stack 4083=for apidoc Amnh||PL_scopestack_ix 4084 4085Values are pushed to the scope stack using the C<ENTER> macro, which begins a 4086new nested scope. Any items pushed to the save stack are then restored at the 4087next nested invocation of the C<LEAVE> macro. 4088 4089=head1 Dynamic Scope and the Context Stack 4090 4091B<Note:> this section describes a non-public internal API that is subject 4092to change without notice. 4093 4094=head2 Introduction to the context stack 4095 4096In Perl, dynamic scoping refers to the runtime nesting of things like 4097subroutine calls, evals etc, as well as the entering and exiting of block 4098scopes. For example, the restoring of a C<local>ised variable is 4099determined by the dynamic scope. 4100 4101Perl tracks the dynamic scope by a data structure called the context 4102stack, which is an array of C<PERL_CONTEXT> structures, and which is 4103itself a big union for all the types of context. Whenever a new scope is 4104entered (such as a block, a C<for> loop, or a subroutine call), a new 4105context entry is pushed onto the stack. Similarly when leaving a block or 4106returning from a subroutine call etc. a context is popped. Since the 4107context stack represents the current dynamic scope, it can be searched. 4108For example, C<next LABEL> searches back through the stack looking for a 4109loop context that matches the label; C<return> pops contexts until it 4110finds a sub or eval context or similar; C<caller> examines sub contexts on 4111the stack. 4112 4113=for apidoc_section $concurrency 4114=for apidoc Cyh||PERL_CONTEXT 4115 4116Each context entry is labelled with a context type, C<cx_type>. Typical 4117context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> 4118and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) 4119and a sort block. The type determines which part of the context union are 4120valid. 4121 4122=for apidoc Cyh ||cx_type 4123 4124=for apidoc Cmnh||CXt_BLOCK 4125=for apidoc_item ||CXt_EVAL 4126=for apidoc_item ||CXt_FORMAT 4127=for apidoc_item ||CXt_GIVEN 4128=for apidoc_item ||CXt_LOOP_ARY 4129=for apidoc_item ||CXt_LOOP_LAZYIV 4130=for apidoc_item ||CXt_LOOP_LAZYSV 4131=for apidoc_item ||CXt_LOOP_LIST 4132=for apidoc_item ||CXt_LOOP_PLAIN 4133=for apidoc_item ||CXt_NULL 4134=for apidoc_item ||CXt_SUB 4135=for apidoc_item ||CXt_SUBST 4136=for apidoc_item ||CXt_WHEN 4137 4138The main division in the context struct is between a substitution scope 4139(C<CXt_SUBST>) and block scopes, which are everything else. The former is 4140just used while executing C<s///e>, and won't be discussed further 4141here. 4142 4143All the block scope types share a common base, which corresponds to 4144C<CXt_BLOCK>. This stores the old values of various scope-related 4145variables like C<PL_curpm>, as well as information about the current 4146scope, such as C<gimme>. On scope exit, the old variables are restored. 4147 4148Particular block scope types store extra per-type information. For 4149example, C<CXt_SUB> stores the currently executing CV, while the various 4150for loop types might hold the original loop variable SV. On scope exit, 4151the per-type data is processed; for example the CV has its reference count 4152decremented, and the original loop variable is restored. 4153 4154The macro C<cxstack> returns the base of the current context stack, while 4155C<cxstack_ix> is the index of the current frame within that stack. 4156 4157=for apidoc_section $concurrency 4158=for apidoc Cmnh|PERL_CONTEXT *|cxstack 4159=for apidoc Cmnh|I32|cxstack_ix 4160 4161In fact, the context stack is actually part of a stack-of-stacks system; 4162whenever something unusual is done such as calling a C<DESTROY> or tie 4163handler, a new stack is pushed, then popped at the end. 4164 4165Note that the API described here changed considerably in perl 5.24; prior 4166to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 4167they were replaced by the inline static functions described below. In 4168addition, the ordering and detail of how these macros/function work 4169changed in many ways, often subtly. In particular they didn't handle 4170saving the savestack and temps stack positions, and required additional 4171C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The 4172old-style macros will not be described further. 4173 4174 4175=head2 Pushing contexts 4176 4177For pushing a new context, the two basic functions are 4178C<cx = cx_pushblock()>, which pushes a new basic context block and returns 4179its address, and a family of similar functions with names like 4180C<cx_pushsub(cx)> which populate the additional type-dependent fields in 4181the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their 4182own push functions, as they don't store any data beyond that pushed by 4183C<cx_pushblock>. 4184 4185The fields of the context struct and the arguments to the C<cx_*> 4186functions are subject to change between perl releases, representing 4187whatever is convenient or efficient for that release. 4188 4189A typical context stack pushing can be found in C<pp_entersub>; the 4190following shows a simplified and stripped-down example of a non-XS call, 4191along with comments showing roughly what each function does. 4192 4193 dMARK; 4194 U8 gimme = GIMME_V; 4195 bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); 4196 OP *retop = PL_op->op_next; 4197 I32 old_ss_ix = PL_savestack_ix; 4198 CV *cv = ....; 4199 4200 /* ... make mortal copies of stack args which are PADTMPs here ... */ 4201 4202 /* ... do any additional savestack pushes here ... */ 4203 4204 /* Now push a new context entry of type 'CXt_SUB'; initially just 4205 * doing the actions common to all block types: */ 4206 4207 cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); 4208 4209 /* this does (approximately): 4210 CXINC; /* cxstack_ix++ (grow if necessary) */ 4211 cx = CX_CUR(); /* and get the address of new frame */ 4212 cx->cx_type = CXt_SUB; 4213 cx->blk_gimme = gimme; 4214 cx->blk_oldsp = MARK - PL_stack_base; 4215 cx->blk_oldsaveix = old_ss_ix; 4216 cx->blk_oldcop = PL_curcop; 4217 cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; 4218 cx->blk_oldscopesp = PL_scopestack_ix; 4219 cx->blk_oldpm = PL_curpm; 4220 cx->blk_old_tmpsfloor = PL_tmps_floor; 4221 4222 PL_tmps_floor = PL_tmps_ix; 4223 */ 4224 4225 4226 /* then update the new context frame with subroutine-specific info, 4227 * such as the CV about to be executed: */ 4228 4229 cx_pushsub(cx, cv, retop, hasargs); 4230 4231 /* this does (approximately): 4232 cx->blk_sub.cv = cv; 4233 cx->blk_sub.olddepth = CvDEPTH(cv); 4234 cx->blk_sub.prevcomppad = PL_comppad; 4235 cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; 4236 cx->blk_sub.retop = retop; 4237 SvREFCNT_inc_simple_void_NN(cv); 4238 */ 4239 4240=for apidoc_section $concurrency 4241=for apidoc Cmnh||CXINC 4242 4243Note that C<cx_pushblock()> sets two new floors: for the args stack (to 4244C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this 4245scope level, every C<nextstate> (amongst others) will reset the args and 4246tmps stack levels to these floors. Note that since C<cx_pushblock> uses 4247the current value of C<PL_tmps_ix> rather than it being passed as an arg, 4248this dictates at what point C<cx_pushblock> should be called. In 4249particular, any new mortals which should be freed only on scope exit 4250(rather than at the next C<nextstate>) should be created first. 4251 4252Most callers of C<cx_pushblock> simply set the new args stack floor to the 4253top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the 4254items being iterated over on the stack, and so sets C<blk_oldsp> to the 4255top of these items instead. Note that, contrary to its name, C<blk_oldsp> 4256doesn't always represent the value to restore C<PL_stack_sp> to on scope 4257exit. 4258 4259Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is 4260later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, 4261this is because, although most values needing saving are stored in fields 4262of the context struct, an extra value needs saving only when the debugger 4263is running, and it doesn't make sense to bloat the struct for this rare 4264case. So instead it is saved on the savestack. Since this value gets 4265calculated and saved before the context is pushed, it is necessary to pass 4266the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the 4267saved value gets freed during scope exit. For most users of 4268C<cx_pushblock>, where nothing needs pushing on the save stack, 4269C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. 4270 4271Note that where possible, values should be saved in the context struct 4272rather than on the save stack; it's much faster that way. 4273 4274Normally C<cx_pushblock> should be immediately followed by the appropriate 4275C<cx_pushfoo>, with nothing between them; this is because if code 4276in-between could die (e.g. a warning upgraded to fatal), then the context 4277stack unwinding code in C<dounwind> would see (in the example above) a 4278C<CXt_SUB> context frame, but without all the subroutine-specific fields 4279set, and crashes would soon ensue. 4280 4281=for apidoc dounwind 4282 4283Where the two must be separate, initially set the type to C<CXt_NULL> or 4284C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the 4285C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's 4286determined which type of loop it's pushing. 4287 4288=head2 Popping contexts 4289 4290Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note 4291however, that unlike C<cx_pushblock>, neither of these functions actually 4292decrement the current context stack index; this is done separately using 4293C<CX_POP()>. 4294 4295=for apidoc_section $concurrency 4296=for apidoc Cmh|void|CX_POP|PERL_CONTEXT* cx 4297 4298There are two main ways that contexts are popped. During normal execution 4299as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and 4300C<pp_leavesub> process and pop just one context using C<cx_popfoo> and 4301C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> 4302may have to pop back several scopes until a sub or loop context is found, 4303and exceptions (such as C<die>) need to pop back contexts until an eval 4304context is found. Both of these are accomplished by C<dounwind()>, which 4305is capable of processing and popping all contexts above the target one. 4306 4307Here is a typical example of context popping, as found in C<pp_leavesub> 4308(simplified slightly): 4309 4310 U8 gimme; 4311 PERL_CONTEXT *cx; 4312 SV **oldsp; 4313 OP *retop; 4314 4315 cx = CX_CUR(); 4316 4317 gimme = cx->blk_gimme; 4318 oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ 4319 4320 if (gimme == G_VOID) 4321 PL_stack_sp = oldsp; 4322 else 4323 leave_adjust_stacks(oldsp, oldsp, gimme, 0); 4324 4325 CX_LEAVE_SCOPE(cx); 4326 cx_popsub(cx); 4327 cx_popblock(cx); 4328 retop = cx->blk_sub.retop; 4329 CX_POP(cx); 4330 4331 return retop; 4332 4333=for apidoc_section $concurrency 4334=for apidoc Cmh||CX_CUR 4335 4336The steps above are in a very specific order, designed to be the reverse 4337order of when the context was pushed. The first thing to do is to copy 4338and/or protect any return arguments and free any temps in the current 4339scope. Scope exits like an rvalue sub normally return a mortal copy of 4340their return args (as opposed to lvalue subs). It is important to make 4341this copy before the save stack is popped or variables are restored, or 4342bad things like the following can happen: 4343 4344 sub f { my $x =...; $x } # $x freed before we get to copy it 4345 sub f { /(...)/; $1 } # PL_curpm restored before $1 copied 4346 4347Although we wish to free any temps at the same time, we have to be careful 4348not to free any temps which are keeping return args alive; nor to free the 4349temps we have just created while mortal copying return args. Fortunately, 4350C<leave_adjust_stacks()> is capable of making mortal copies of return args, 4351shifting args down the stack, and only processing those entries on the 4352temps stack that are safe to do so. 4353 4354In void context no args are returned, so it's more efficient to skip 4355calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op 4356is likely to be imminently called which will do a C<FREETMPS>, so there's 4357no need to do that either. 4358 4359The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just 4360defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the 4361popping, it's possible for perl to call destructors, call C<STORE> to undo 4362localisations of tied vars, and so on. Any of these can die or call 4363C<exit()>. In this case, C<dounwind()> will be called, and the current 4364context stack frame will be re-processed. Thus it is vital that all steps 4365in popping a context are done in such a way to support reentrancy. The 4366other alternative, of decrementing C<cxstack_ix> I<before> processing the 4367frame, would lead to leaks and the like if something died halfway through, 4368or overwriting of the current frame. 4369 4370=for apidoc_section $concurrency 4371=for apidoc Cmh|void|CX_LEAVE_SCOPE|PERL_CONTEXT* cx 4372 4373C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack 4374items have been popped before dying and getting trapped by eval, then the 4375C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where 4376the first one left off. 4377 4378The next step is the type-specific context processing; in this case 4379C<cx_popsub>. In part, this looks like: 4380 4381 cv = cx->blk_sub.cv; 4382 CvDEPTH(cv) = cx->blk_sub.olddepth; 4383 cx->blk_sub.cv = NULL; 4384 SvREFCNT_dec(cv); 4385 4386where its processing the just-executed CV. Note that before it decrements 4387the CV's reference count, it nulls the C<blk_sub.cv>. This means that if 4388it re-enters, the CV won't be freed twice. It also means that you can't 4389rely on such type-specific fields having useful values after the return 4390from C<cx_popfoo>. 4391 4392Next, C<cx_popblock> restores all the various interpreter vars to their 4393previous values or previous high water marks; it expands to: 4394 4395 PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; 4396 PL_scopestack_ix = cx->blk_oldscopesp; 4397 PL_curpm = cx->blk_oldpm; 4398 PL_curcop = cx->blk_oldcop; 4399 PL_tmps_floor = cx->blk_old_tmpsfloor; 4400 4401Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, 4402which value to restore it to depends on the context type (specifically 4403C<for (list) {}>), and what args (if any) it returns; and that will 4404already have been sorted out earlier by C<leave_adjust_stacks()>. 4405 4406Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. 4407After this point, it's possible that that the current context frame could 4408be overwritten by other contexts being pushed. Although things like ties 4409and C<DESTROY> are supposed to work within a new context stack, it's best 4410not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately 4411sets C<cx> to null to detect code that is still relying on the field 4412values in that context frame. Note in the C<pp_leavesub()> example above, 4413we grab C<blk_sub.retop> I<before> calling C<CX_POP>. 4414 4415=head2 Redoing contexts 4416 4417Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> 4418as regards to resetting various vars to their base values. It is used in 4419places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than 4420exiting a scope, we want to re-initialise the scope. As well as resetting 4421C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, 4422C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a 4423C<FREETMPS>. 4424 4425 4426=head1 Slab-based operator allocation 4427 4428B<Note:> this section describes a non-public internal API that is subject 4429to change without notice. 4430 4431Perl's internal error-handling mechanisms implement C<die> (and its internal 4432equivalents) using longjmp. If this occurs during lexing, parsing or 4433compilation, we must ensure that any ops allocated as part of the compilation 4434process are freed. (Older Perl versions did not adequately handle this 4435situation: when failing a parse, they would leak ops that were stored in 4436C C<auto> variables and not linked anywhere else.) 4437 4438To handle this situation, Perl uses I<op slabs> that are attached to the 4439currently-compiling CV. A slab is a chunk of allocated memory. New ops are 4440allocated as regions of the slab. If the slab fills up, a new one is created 4441(and linked from the previous one). When an error occurs and the CV is freed, 4442any ops remaining are freed. 4443 4444Each op is preceded by two pointers: one points to the next op in the slab, and 4445the other points to the slab that owns it. The next-op pointer is needed so 4446that Perl can iterate over a slab and free all its ops. (Op structures are of 4447different sizes, so the slab's ops can't merely be treated as a dense array.) 4448The slab pointer is needed for accessing a reference count on the slab: when 4449the last op on a slab is freed, the slab itself is freed. 4450 4451The slab allocator puts the ops at the end of the slab first. This will tend to 4452allocate the leaves of the op tree first, and the layout will therefore 4453hopefully be cache-friendly. In addition, this means that there's no need to 4454store the size of the slab (see below on why slabs vary in size), because Perl 4455can follow pointers to find the last op. 4456 4457It might seem possible to eliminate slab reference counts altogether, by having 4458all ops implicitly attached to C<PL_compcv> when allocated and freed when the 4459CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and 4460thus free ops faster. But that doesn't work in those cases where ops need to 4461survive beyond their CVs, such as re-evals. 4462 4463The CV also has to have a reference count on the slab. Sometimes the first op 4464created is immediately freed. If the reference count of the slab reaches 0, 4465then it will be freed with the CV still pointing to it. 4466 4467CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count 4468on the slab. When this flag is set, the slab is accessible via C<CvSTART> when 4469C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from 4470C<CvROOT> when it is set. The alternative to this approach of sneaking the slab 4471into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by 4472another pointer. But that would make all CVs larger, even though slab-based op 4473freeing is typically of benefit only for programs that make significant use of 4474string eval. 4475 4476=for apidoc_section $concurrency 4477=for apidoc Cmnh| |CVf_SLABBED 4478=for apidoc_item |OP *|CvROOT|CV * sv 4479=for apidoc_item |OP *|CvSTART|CV * sv 4480 4481When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing 4482the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is 4483assumed that a compilation error has occurred, so the op slab is traversed and 4484all the ops are freed. 4485 4486Under normal circumstances, the CV forgets about its slab (decrementing the 4487reference count) when the root is attached. So the slab reference counting that 4488happens when ops are freed takes care of freeing the slab. In some cases, the 4489CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the 4490ops can survive after the CV is done away with. 4491 4492Forgetting the slab when the root is attached is not strictly necessary, but 4493avoids potential problems with C<CvROOT> being written over. There is code all 4494over the place, both in core and on CPAN, that does things with C<CvROOT>, so 4495forgetting the slab makes things more robust and avoids potential problems. 4496 4497Since the CV takes ownership of its slab when flagged, that flag is never 4498copied when a CV is cloned, as one CV could free a slab that another CV still 4499points to, since forced freeing of ops ignores the reference count (but asserts 4500that it looks right). 4501 4502To avoid slab fragmentation, freed ops are marked as freed and attached to the 4503slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused 4504when possible. Not reusing freed ops would be simpler, but it would result in 4505significantly higher memory usage for programs with large C<if (DEBUG) {...}> 4506blocks. 4507 4508C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause 4509an op to be freed after its CV. If the CV has forcibly freed the ops on its 4510slab and the slab itself, then we will be fiddling with a freed slab. Making 4511C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when 4512there is no compilation error, so the op would never be freed. It holds 4513a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> 4514now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of 4515ops after a compilation error won't free any ops thus marked. 4516 4517Since many pieces of code create tiny subroutines consisting of only a few ops, 4518and since a huge slab would be quite a bit of baggage for those to carry 4519around, the first slab is always very small. To avoid allocating too many 4520slabs for a single CV, each subsequent slab is twice the size of the previous. 4521 4522Smartmatch expects to be able to allocate an op at run time, run it, and then 4523throw it away. For that to work the op is simply malloced when C<PL_compcv> hasn't 4524been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), 4525to distinguish them from malloced ops. 4526 4527 4528=head1 AUTHORS 4529 4530Until May 1997, this document was maintained by Jeff Okamoto 4531E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 4532itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 4533 4534With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 4535Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 4536Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 4537Stephen McCamant, and Gurusamy Sarathy. 4538 4539=head1 SEE ALSO 4540 4541L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> 4542