1=head1 NAME 2 3perlxstypemap - Perl XS C/Perl type mapping 4 5=head1 DESCRIPTION 6 7The more you think about interfacing between two languages, the more 8you'll realize that the majority of programmer effort has to go into 9converting between the data structures that are native to either of 10the languages involved. This trumps other matter such as differing 11calling conventions because the problem space is so much greater. 12There are simply more ways to shove data into memory than there are 13ways to implement a function call. 14 15Perl XS' attempt at a solution to this is the concept of typemaps. 16At an abstract level, a Perl XS typemap is nothing but a recipe for 17converting from a certain Perl data structure to a certain C 18data structure and vice versa. Since there can be C types that 19are sufficiently similar to one another to warrant converting with 20the same logic, XS typemaps are represented by a unique identifier, 21henceforth called an B<XS type> in this document. You can then tell 22the XS compiler that multiple C types are to be mapped with the same 23XS typemap. 24 25In your XS code, when you define an argument with a C type or when 26you are using a C<CODE:> and an C<OUTPUT:> section together with a 27C return type of your XSUB, it'll be the typemapping mechanism that 28makes this easy. 29 30=head2 Anatomy of a typemap 31 32In more practical terms, the typemap is a collection of code 33fragments which are used by the B<xsubpp> compiler to map C function 34parameters and values to Perl values. The typemap file may consist 35of three sections labelled C<TYPEMAP>, C<INPUT>, and C<OUTPUT>. 36An unlabelled initial section is assumed to be a C<TYPEMAP> section. 37The INPUT section tells the compiler how to translate Perl values 38into variables of certain C types. The OUTPUT section tells the 39compiler how to translate the values from certain C types into values 40Perl can understand. The TYPEMAP section tells the compiler which 41of the INPUT and OUTPUT code fragments should be used to map a given 42C type to a Perl value. The section labels C<TYPEMAP>, C<INPUT>, or 43C<OUTPUT> must begin in the first column on a line by themselves, 44and must be in uppercase. 45 46Each type of section can appear an arbitrary number of times 47and does not have to appear at all. For example, a typemap may 48commonly lack C<INPUT> and C<OUTPUT> sections if all it needs to 49do is associate additional C types with core XS types like T_PTROBJ. 50Lines that start with a hash C<#> are considered comments and ignored 51in the C<TYPEMAP> section, but are considered significant in C<INPUT> 52and C<OUTPUT>. Blank lines are generally ignored. 53 54Traditionally, typemaps needed to be written to a separate file, 55conventionally called C<typemap> in a CPAN distribution. With 56ExtUtils::ParseXS (the XS compiler) version 3.12 or better which 57comes with perl 5.16, typemaps can also be embedded directly into 58XS code using a HERE-doc like syntax: 59 60 TYPEMAP: <<HERE 61 ... 62 HERE 63 64where C<HERE> can be replaced by other identifiers like with normal 65Perl HERE-docs. All details below about the typemap textual format 66remain valid. 67 68The C<TYPEMAP> section should contain one pair of C type and 69XS type per line as follows. An example from the core typemap file: 70 71 TYPEMAP 72 # all variants of char* is handled by the T_PV typemap 73 char * T_PV 74 const char * T_PV 75 unsigned char * T_PV 76 ... 77 78The C<INPUT> and C<OUTPUT> sections have identical formats, that is, 79each unindented line starts a new in- or output map respectively. 80A new in- or output map must start with the name of the XS type to 81map on a line by itself, followed by the code that implements it 82indented on the following lines. Example: 83 84 INPUT 85 T_PV 86 $var = ($type)SvPV_nolen($arg) 87 T_PTR 88 $var = INT2PTR($type,SvIV($arg)) 89 90We'll get to the meaning of those Perlish-looking variables in a 91little bit. 92 93Finally, here's an example of the full typemap file for mapping C 94strings of the C<char *> type to Perl scalars/strings: 95 96 TYPEMAP 97 char * T_PV 98 99 INPUT 100 T_PV 101 $var = ($type)SvPV_nolen($arg) 102 103 OUTPUT 104 T_PV 105 sv_setpv((SV*)$arg, $var); 106 107Here's a more complicated example: suppose that you wanted 108C<struct netconfig> to be blessed into the class C<Net::Config>. 109One way to do this is to use underscores (_) to separate package 110names, as follows: 111 112 typedef struct netconfig * Net_Config; 113 114And then provide a typemap entry C<T_PTROBJ_SPECIAL> that maps 115underscores to double-colons (::), and declare C<Net_Config> to be of 116that type: 117 118 TYPEMAP 119 Net_Config T_PTROBJ_SPECIAL 120 121 INPUT 122 T_PTROBJ_SPECIAL 123 if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")){ 124 IV tmp = SvIV((SV*)SvRV($arg)); 125 $var = INT2PTR($type, tmp); 126 } 127 else 128 croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\") 129 130 OUTPUT 131 T_PTROBJ_SPECIAL 132 sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\", 133 (void*)$var); 134 135The INPUT and OUTPUT sections substitute underscores for double-colons 136on the fly, giving the desired effect. This example demonstrates some 137of the power and versatility of the typemap facility. 138 139The C<INT2PTR> macro (defined in perl.h) casts an integer to a pointer 140of a given type, taking care of the possible different size of integers 141and pointers. There are also C<PTR2IV>, C<PTR2UV>, C<PTR2NV> macros, 142to map the other way, which may be useful in OUTPUT sections. 143 144=head2 The Role of the typemap File in Your Distribution 145 146The default typemap in the F<lib/ExtUtils> directory of the Perl source 147contains many useful types which can be used by Perl extensions. Some 148extensions define additional typemaps which they keep in their own directory. 149These additional typemaps may reference INPUT and OUTPUT maps in the main 150typemap. The B<xsubpp> compiler will allow the extension's own typemap to 151override any mappings which are in the default typemap. Instead of using 152an additional F<typemap> file, typemaps may be embedded verbatim in XS 153with a heredoc-like syntax. See the documentation on the C<TYPEMAP:> XS 154keyword. 155 156For CPAN distributions, you can assume that the XS types defined by 157the perl core are already available. Additionally, the core typemap 158has default XS types for a large number of C types. For example, if 159you simply return a C<char *> from your XSUB, the core typemap will 160have this C type associated with the T_PV XS type. That means your 161C string will be copied into the PV (pointer value) slot of a new scalar 162that will be returned from your XSUB to Perl. 163 164If you're developing a CPAN distribution using XS, you may add your own 165file called F<typemap> to the distribution. That file may contain 166typemaps that either map types that are specific to your code or that 167override the core typemap file's mappings for common C types. 168 169=head2 Sharing typemaps Between CPAN Distributions 170 171Starting with ExtUtils::ParseXS version 3.13_01 (comes with perl 5.16 172and better), it is rather easy to share typemap code between multiple 173CPAN distributions. The general idea is to share it as a module that 174offers a certain API and have the dependent modules declare that as a 175built-time requirement and import the typemap into the XS. An example 176of such a typemap-sharing module on CPAN is 177C<ExtUtils::Typemaps::Basic>. Two steps to getting that module's 178typemaps available in your code: 179 180=over 4 181 182=item * 183 184Declare C<ExtUtils::Typemaps::Basic> as a build-time dependency 185in C<Makefile.PL> (use C<BUILD_REQUIRES>), or in your C<Build.PL> 186(use C<build_requires>). 187 188=item * 189 190Include the following line in the XS section of your XS file: 191(don't break the line) 192 193 INCLUDE_COMMAND: $^X -MExtUtils::Typemaps::Cmd 194 -e "print embeddable_typemap(q{Basic})" 195 196=back 197 198=head2 Writing typemap Entries 199 200Each INPUT or OUTPUT typemap entry is a double-quoted Perl string that 201will be evaluated in the presence of certain variables to get the 202final C code for mapping a certain C type. 203 204This means that you can embed Perl code in your typemap (C) code using 205constructs such as 206C<${ perl code that evaluates to scalar reference here }>. A common 207use case is to generate error messages that refer to the true function 208name even when using the ALIAS XS feature: 209 210 ${ $ALIAS ? \q[GvNAME(CvGV(cv))] : \qq[\"$pname\"] } 211 212For many typemap examples, refer to the core typemap file that can be 213found in the perl source tree at F<lib/ExtUtils/typemap>. 214 215The Perl variables that are available for interpolation into typemaps 216are the following: 217 218=over 4 219 220=item * 221 222I<$var> - the name of the input or output variable, eg. RETVAL for 223return values. 224 225=item * 226 227I<$type> - the raw C type of the parameter, any C<:> replaced with 228C<_>. 229e.g. for a type of C<Foo::Bar>, I<$type> is C<Foo__Bar> 230 231=item * 232 233I<$ntype> - the supplied type with C<*> replaced with C<Ptr>. 234e.g. for a type of C<Foo*>, I<$ntype> is C<FooPtr> 235 236=item * 237 238I<$arg> - the stack entry, that the parameter is input from or output 239to, e.g. C<ST(0)> 240 241=item * 242 243I<$argoff> - the argument stack offset of the argument. ie. 0 for the 244first argument, etc. 245 246=item * 247 248I<$pname> - the full name of the XSUB, with including the C<PACKAGE> 249name, with any C<PREFIX> stripped. This is the non-ALIAS name. 250 251=item * 252 253I<$Package> - the package specified by the most recent C<PACKAGE> 254keyword. 255 256=item * 257 258I<$ALIAS> - non-zero if the current XSUB has any aliases declared with 259C<ALIAS>. 260 261=back 262 263=head2 Full Listing of Core Typemaps 264 265Each C type is represented by an entry in the typemap file that 266is responsible for converting perl variables (SV, AV, HV, CV, etc.) 267to and from that type. The following sections list all XS types 268that come with perl by default. 269 270=over 4 271 272=item T_SV 273 274This simply passes the C representation of the Perl variable (an SV*) 275in and out of the XS layer. This can be used if the C code wants 276to deal directly with the Perl variable. 277 278=item T_SVREF 279 280Used to pass in and return a reference to an SV. 281 282Note that this typemap does not decrement the reference count 283when returning the reference to an SV*. 284See also: T_SVREF_REFCOUNT_FIXED 285 286=item T_SVREF_FIXED 287 288Used to pass in and return a reference to an SV. 289This is a fixed 290variant of T_SVREF that decrements the refcount appropriately 291when returning a reference to an SV*. Introduced in perl 5.15.4. 292 293=item T_AVREF 294 295From the perl level this is a reference to a perl array. 296From the C level this is a pointer to an AV. 297 298Note that this typemap does not decrement the reference count 299when returning an AV*. See also: T_AVREF_REFCOUNT_FIXED 300 301=item T_AVREF_REFCOUNT_FIXED 302 303From the perl level this is a reference to a perl array. 304From the C level this is a pointer to an AV. This is a fixed 305variant of T_AVREF that decrements the refcount appropriately 306when returning an AV*. Introduced in perl 5.15.4. 307 308=item T_HVREF 309 310From the perl level this is a reference to a perl hash. 311From the C level this is a pointer to an HV. 312 313Note that this typemap does not decrement the reference count 314when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED 315 316=item T_HVREF_REFCOUNT_FIXED 317 318From the perl level this is a reference to a perl hash. 319From the C level this is a pointer to an HV. This is a fixed 320variant of T_HVREF that decrements the refcount appropriately 321when returning an HV*. Introduced in perl 5.15.4. 322 323=item T_CVREF 324 325From the perl level this is a reference to a perl subroutine 326(e.g. $sub = sub { 1 };). From the C level this is a pointer 327to a CV. 328 329Note that this typemap does not decrement the reference count 330when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED 331 332=item T_CVREF_REFCOUNT_FIXED 333 334From the perl level this is a reference to a perl subroutine 335(e.g. $sub = sub { 1 };). From the C level this is a pointer 336to a CV. 337 338This is a fixed 339variant of T_HVREF that decrements the refcount appropriately 340when returning an HV*. Introduced in perl 5.15.4. 341 342=item T_SYSRET 343 344The T_SYSRET typemap is used to process return values from system calls. 345It is only meaningful when passing values from C to perl (there is 346no concept of passing a system return value from Perl to C). 347 348System calls return -1 on error (setting ERRNO with the reason) 349and (usually) 0 on success. If the return value is -1 this typemap 350returns C<undef>. If the return value is not -1, this typemap 351translates a 0 (perl false) to "0 but true" (which 352is perl true) or returns the value itself, to indicate that the 353command succeeded. 354 355The L<POSIX|POSIX> module makes extensive use of this type. 356 357=item T_UV 358 359An unsigned integer. 360 361=item T_IV 362 363A signed integer. This is cast to the required integer type when 364passed to C and converted to an IV when passed back to Perl. 365 366=item T_INT 367 368A signed integer. This typemap converts the Perl value to a native 369integer type (the C<int> type on the current platform). When returning 370the value to perl it is processed in the same way as for T_IV. 371 372Its behaviour is identical to using an C<int> type in XS with T_IV. 373 374=item T_ENUM 375 376An enum value. Used to transfer an enum component 377from C. There is no reason to pass an enum value to C since 378it is stored as an IV inside perl. 379 380=item T_BOOL 381 382A boolean type. This can be used to pass true and false values to and 383from C. 384 385=item T_U_INT 386 387This is for unsigned integers. It is equivalent to using T_UV 388but explicitly casts the variable to type C<unsigned int>. 389The default type for C<unsigned int> is T_UV. 390 391=item T_SHORT 392 393Short integers. This is equivalent to T_IV but explicitly casts 394the return to type C<short>. The default typemap for C<short> 395is T_IV. 396 397=item T_U_SHORT 398 399Unsigned short integers. This is equivalent to T_UV but explicitly 400casts the return to type C<unsigned short>. The default typemap for 401C<unsigned short> is T_UV. 402 403T_U_SHORT is used for type C<U16> in the standard typemap. 404 405=item T_LONG 406 407Long integers. This is equivalent to T_IV but explicitly casts 408the return to type C<long>. The default typemap for C<long> 409is T_IV. 410 411=item T_U_LONG 412 413Unsigned long integers. This is equivalent to T_UV but explicitly 414casts the return to type C<unsigned long>. The default typemap for 415C<unsigned long> is T_UV. 416 417T_U_LONG is used for type C<U32> in the standard typemap. 418 419=item T_CHAR 420 421Single 8-bit characters. 422 423=item T_U_CHAR 424 425An unsigned byte. 426 427=item T_FLOAT 428 429A floating point number. This typemap guarantees to return a variable 430cast to a C<float>. 431 432=item T_NV 433 434A Perl floating point number. Similar to T_IV and T_UV in that the 435return type is cast to the requested numeric type rather than 436to a specific type. 437 438=item T_DOUBLE 439 440A double precision floating point number. This typemap guarantees to 441return a variable cast to a C<double>. 442 443=item T_PV 444 445A string (char *). 446 447=item T_PTR 448 449A memory address (pointer). Typically associated with a C<void *> 450type. 451 452=item T_PTRREF 453 454Similar to T_PTR except that the pointer is stored in a scalar and the 455reference to that scalar is returned to the caller. This can be used 456to hide the actual pointer value from the programmer since it is usually 457not required directly from within perl. 458 459The typemap checks that a scalar reference is passed from perl to XS. 460 461=item T_PTROBJ 462 463Similar to T_PTRREF except that the reference is blessed into a class. 464This allows the pointer to be used as an object. Most commonly used to 465deal with C structs. The typemap checks that the perl object passed 466into the XS routine is of the correct class (or part of a subclass). 467 468The pointer is blessed into a class that is derived from the name 469of type of the pointer but with all '*' in the name replaced with 470'Ptr'. 471 472For C<DESTROY> XSUBs only, a T_PTROBJ is optimized to a T_PTRREF. This means 473the class check is skipped. 474 475=item T_REF_IV_REF 476 477NOT YET 478 479=item T_REF_IV_PTR 480 481Similar to T_PTROBJ in that the pointer is blessed into a scalar object. 482The difference is that when the object is passed back into XS it must be 483of the correct type (inheritance is not supported) while T_PTROBJ supports 484inheritance. 485 486The pointer is blessed into a class that is derived from the name 487of type of the pointer but with all '*' in the name replaced with 488'Ptr'. 489 490For C<DESTROY> XSUBs only, a T_REF_IV_PTR is optimized to a T_PTRREF. This 491means the class check is skipped. 492 493=item T_PTRDESC 494 495NOT YET 496 497=item T_REFREF 498 499Similar to T_PTRREF, except the pointer stored in the referenced scalar 500is dereferenced and copied to the output variable. This means that 501T_REFREF is to T_PTRREF as T_OPAQUE is to T_OPAQUEPTR. All clear? 502 503Only the INPUT part of this is implemented (Perl to XSUB) and there 504are no known users in core or on CPAN. 505 506=item T_REFOBJ 507 508Like T_REFREF, except it does strict type checking (inheritance is not 509supported). 510 511For C<DESTROY> XSUBs only, a T_REFOBJ is optimized to a T_REFREF. This means 512the class check is skipped. 513 514=item T_OPAQUEPTR 515 516This can be used to store bytes in the string component of the 517SV. Here the representation of the data is irrelevant to perl and the 518bytes themselves are just stored in the SV. It is assumed that the C 519variable is a pointer (the bytes are copied from that memory 520location). If the pointer is pointing to something that is 521represented by 8 bytes then those 8 bytes are stored in the SV (and 522length() will report a value of 8). This entry is similar to T_OPAQUE. 523 524In principle the unpack() command can be used to convert the bytes 525back to a number (if the underlying type is known to be a number). 526 527This entry can be used to store a C structure (the number 528of bytes to be copied is calculated using the C C<sizeof> function) 529and can be used as an alternative to T_PTRREF without having to worry 530about a memory leak (since Perl will clean up the SV). 531 532=item T_OPAQUE 533 534This can be used to store data from non-pointer types in the string 535part of an SV. It is similar to T_OPAQUEPTR except that the 536typemap retrieves the pointer directly rather than assuming it 537is being supplied. For example, if an integer is imported into 538Perl using T_OPAQUE rather than T_IV the underlying bytes representing 539the integer will be stored in the SV but the actual integer value will 540not be available. i.e. The data is opaque to perl. 541 542The data may be retrieved using the C<unpack> function if the 543underlying type of the byte stream is known. 544 545T_OPAQUE supports input and output of simple types. 546T_OPAQUEPTR can be used to pass these bytes back into C if a pointer 547is acceptable. 548 549=item Implicit array 550 551xsubpp supports a special syntax for returning 552packed C arrays to perl. If the XS return type is given as 553 554 array(type, nelem) 555 556xsubpp will copy the contents of C<nelem * sizeof(type)> bytes from 557RETVAL to an SV and push it onto the stack. This is only really useful 558if the number of items to be returned is known at compile time and you 559don't mind having a string of bytes in your SV. Use T_ARRAY to push a 560variable number of arguments onto the return stack (they won't be 561packed as a single string though). 562 563This is similar to using T_OPAQUEPTR but can be used to process more 564than one element. 565 566=item T_PACKED 567 568Calls user-supplied functions for conversion. For C<OUTPUT> 569(XSUB to Perl), a function named C<XS_pack_$ntype> is called 570with the output Perl scalar and the C variable to convert from. 571C<$ntype> is the normalized C type that is to be mapped to 572Perl. Normalized means that all C<*> are replaced by the 573string C<Ptr>. The return value of the function is ignored. 574 575Conversely for C<INPUT> (Perl to XSUB) mapping, the 576function named C<XS_unpack_$ntype> is called with the input Perl 577scalar as argument and the return value is cast to the mapped 578C type and assigned to the output C variable. 579 580An example conversion function for a typemapped struct 581C<foo_t *> might be: 582 583 static void 584 XS_pack_foo_tPtr(SV *out, foo_t *in) 585 { 586 dTHX; /* alas, signature does not include pTHX_ */ 587 HV* hash = newHV(); 588 hv_stores(hash, "int_member", newSViv(in->int_member)); 589 hv_stores(hash, "float_member", newSVnv(in->float_member)); 590 /* ... */ 591 592 /* mortalize as thy stack is not refcounted */ 593 sv_setsv(out, sv_2mortal(newRV_noinc((SV*)hash))); 594 } 595 596The conversion from Perl to C is left as an exercise to the reader, 597but the prototype would be: 598 599 static foo_t * 600 XS_unpack_foo_tPtr(SV *in); 601 602Instead of an actual C function that has to fetch the thread context 603using C<dTHX>, you can define macros of the same name and avoid the 604overhead. Also, keep in mind to possibly free the memory allocated by 605C<XS_unpack_foo_tPtr>. 606 607=item T_PACKEDARRAY 608 609T_PACKEDARRAY is similar to T_PACKED. In fact, the C<INPUT> (Perl 610to XSUB) typemap is identical, but the C<OUTPUT> typemap passes 611an additional argument to the C<XS_pack_$ntype> function. This 612third parameter indicates the number of elements in the output 613so that the function can handle C arrays sanely. The variable 614needs to be declared by the user and must have the name 615C<count_$ntype> where C<$ntype> is the normalized C type name 616as explained above. The signature of the function would be for 617the example above and C<foo_t **>: 618 619 static void 620 XS_pack_foo_tPtrPtr(SV *out, foo_t *in, UV count_foo_tPtrPtr); 621 622The type of the third parameter is arbitrary as far as the typemap 623is concerned. It just has to be in line with the declared variable. 624 625Of course, unless you know the number of elements in the 626C<sometype **> C array, within your XSUB, the return value from 627C<foo_t ** XS_unpack_foo_tPtrPtr(...)> will be hard to decipher. 628Since the details are all up to the XS author (the typemap user), 629there are several solutions, none of which particularly elegant. 630The most commonly seen solution has been to allocate memory for 631N+1 pointers and assign C<NULL> to the (N+1)th to facilitate 632iteration. 633 634Alternatively, using a customized typemap for your purposes in 635the first place is probably preferable. 636 637=item T_DATAUNIT 638 639NOT YET 640 641=item T_CALLBACK 642 643NOT YET 644 645=item T_ARRAY 646 647This is used to convert the perl argument list to a C array 648and for pushing the contents of a C array onto the perl 649argument stack. 650 651The usual calling signature is 652 653 @out = array_func( @in ); 654 655Any number of arguments can occur in the list before the array but 656the input and output arrays must be the last elements in the list. 657 658When used to pass a perl list to C the XS writer must provide a 659function (named after the array type but with 'Ptr' substituted for 660'*') to allocate the memory required to hold the list. A pointer 661should be returned. It is up to the XS writer to free the memory on 662exit from the function. The variable C<ix_$var> is set to the number 663of elements in the new array. 664 665When returning a C array to Perl the XS writer must provide an integer 666variable called C<size_$var> containing the number of elements in the 667array. This is used to determine how many elements should be pushed 668onto the return argument stack. This is not required on input since 669Perl knows how many arguments are on the stack when the routine is 670called. Ordinarily this variable would be called C<size_RETVAL>. 671 672Additionally, the type of each element is determined from the type of 673the array. If the array uses type C<intArray *> xsubpp will 674automatically work out that it contains variables of type C<int> and 675use that typemap entry to perform the copy of each element. All 676pointer '*' and 'Array' tags are removed from the name to determine 677the subtype. 678 679=item T_STDIO 680 681This is used for passing perl filehandles to and from C using 682C<FILE *> structures. 683 684=item T_INOUT 685 686This is used for passing perl filehandles to and from C using 687C<PerlIO *> structures. The file handle can used for reading and 688writing. This corresponds to the C<+E<lt>> mode, see also T_IN 689and T_OUT. 690 691See L<perliol> for more information on the Perl IO abstraction 692layer. Perl must have been built with C<-Duseperlio>. 693 694There is no check to assert that the filehandle passed from Perl 695to C was created with the right C<open()> mode. 696 697Hint: The L<perlxstut> tutorial covers the T_INOUT, T_IN, and T_OUT 698XS types nicely. 699 700=item T_IN 701 702Same as T_INOUT, but the filehandle that is returned from C to Perl 703can only be used for reading (mode C<E<lt>>). 704 705=item T_OUT 706 707Same as T_INOUT, but the filehandle that is returned from C to Perl 708is set to use the open mode C<+E<gt>>. 709 710=back 711 712