xref: /openbsd-src/gnu/usr.bin/perl/dist/ExtUtils-ParseXS/lib/perlxstypemap.pod (revision b8851fcc53cbe24fd20b090f26dd149e353f6174)
1=head1 NAME
2
3perlxstypemap - Perl XS C/Perl type mapping
4
5=head1 DESCRIPTION
6
7The more you think about interfacing between two languages, the more
8you'll realize that the majority of programmer effort has to go into
9converting between the data structures that are native to either of
10the languages involved.  This trumps other matter such as differing
11calling conventions because the problem space is so much greater.
12There are simply more ways to shove data into memory than there are
13ways to implement a function call.
14
15Perl XS' attempt at a solution to this is the concept of typemaps.
16At an abstract level, a Perl XS typemap is nothing but a recipe for
17converting from a certain Perl data structure to a certain C
18data structure and vice versa.  Since there can be C types that
19are sufficiently similar to one another to warrant converting with
20the same logic, XS typemaps are represented by a unique identifier,
21henceforth called an B<XS type> in this document.  You can then tell
22the XS compiler that multiple C types are to be mapped with the same
23XS typemap.
24
25In your XS code, when you define an argument with a C type or when
26you are using a C<CODE:> and an C<OUTPUT:> section together with a
27C return type of your XSUB, it'll be the typemapping mechanism that
28makes this easy.
29
30=head2 Anatomy of a typemap
31
32In more practical terms, the typemap is a collection of code
33fragments which are used by the B<xsubpp> compiler to map C function
34parameters and values to Perl values.  The typemap file may consist
35of three sections labelled C<TYPEMAP>, C<INPUT>, and C<OUTPUT>.
36An unlabelled initial section is assumed to be a C<TYPEMAP> section.
37The INPUT section tells the compiler how to translate Perl values
38into variables of certain C types.  The OUTPUT section tells the
39compiler how to translate the values from certain C types into values
40Perl can understand.  The TYPEMAP section tells the compiler which
41of the INPUT and OUTPUT code fragments should be used to map a given
42C type to a Perl value.  The section labels C<TYPEMAP>, C<INPUT>, or
43C<OUTPUT> must begin in the first column on a line by themselves,
44and must be in uppercase.
45
46Each type of section can appear an arbitrary number of times
47and does not have to appear at all.  For example, a typemap may
48commonly lack C<INPUT> and C<OUTPUT> sections if all it needs to
49do is associate additional C types with core XS types like T_PTROBJ.
50Lines that start with a hash C<#> are considered comments and ignored
51in the C<TYPEMAP> section, but are considered significant in C<INPUT>
52and C<OUTPUT>. Blank lines are generally ignored.
53
54Traditionally, typemaps needed to be written to a separate file,
55conventionally called C<typemap> in a CPAN distribution.  With
56ExtUtils::ParseXS (the XS compiler) version 3.12 or better which
57comes with perl 5.16, typemaps can also be embedded directly into
58XS code using a HERE-doc like syntax:
59
60  TYPEMAP: <<HERE
61  ...
62  HERE
63
64where C<HERE> can be replaced by other identifiers like with normal
65Perl HERE-docs.  All details below about the typemap textual format
66remain valid.
67
68The C<TYPEMAP> section should contain one pair of C type and
69XS type per line as follows.  An example from the core typemap file:
70
71  TYPEMAP
72  # all variants of char* is handled by the T_PV typemap
73  char *          T_PV
74  const char *    T_PV
75  unsigned char * T_PV
76  ...
77
78The C<INPUT> and C<OUTPUT> sections have identical formats, that is,
79each unindented line starts a new in- or output map respectively.
80A new in- or output map must start with the name of the XS type to
81map on a line by itself, followed by the code that implements it
82indented on the following lines. Example:
83
84  INPUT
85  T_PV
86    $var = ($type)SvPV_nolen($arg)
87  T_PTR
88    $var = INT2PTR($type,SvIV($arg))
89
90We'll get to the meaning of those Perlish-looking variables in a
91little bit.
92
93Finally, here's an example of the full typemap file for mapping C
94strings of the C<char *> type to Perl scalars/strings:
95
96  TYPEMAP
97  char *  T_PV
98
99  INPUT
100  T_PV
101    $var = ($type)SvPV_nolen($arg)
102
103  OUTPUT
104  T_PV
105    sv_setpv((SV*)$arg, $var);
106
107Here's a more complicated example: suppose that you wanted
108C<struct netconfig> to be blessed into the class C<Net::Config>.
109One way to do this is to use underscores (_) to separate package
110names, as follows:
111
112  typedef struct netconfig * Net_Config;
113
114And then provide a typemap entry C<T_PTROBJ_SPECIAL> that maps
115underscores to double-colons (::), and declare C<Net_Config> to be of
116that type:
117
118  TYPEMAP
119  Net_Config      T_PTROBJ_SPECIAL
120
121  INPUT
122  T_PTROBJ_SPECIAL
123    if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")){
124      IV tmp = SvIV((SV*)SvRV($arg));
125      $var = INT2PTR($type, tmp);
126    }
127    else
128      croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")
129
130  OUTPUT
131  T_PTROBJ_SPECIAL
132    sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\",
133                 (void*)$var);
134
135The INPUT and OUTPUT sections substitute underscores for double-colons
136on the fly, giving the desired effect.  This example demonstrates some
137of the power and versatility of the typemap facility.
138
139The C<INT2PTR> macro (defined in perl.h) casts an integer to a pointer
140of a given type, taking care of the possible different size of integers
141and pointers.  There are also C<PTR2IV>, C<PTR2UV>, C<PTR2NV> macros,
142to map the other way, which may be useful in OUTPUT sections.
143
144=head2 The Role of the typemap File in Your Distribution
145
146The default typemap in the F<lib/ExtUtils> directory of the Perl source
147contains many useful types which can be used by Perl extensions.  Some
148extensions define additional typemaps which they keep in their own directory.
149These additional typemaps may reference INPUT and OUTPUT maps in the main
150typemap.  The B<xsubpp> compiler will allow the extension's own typemap to
151override any mappings which are in the default typemap.  Instead of using
152an additional F<typemap> file, typemaps may be embedded verbatim in XS
153with a heredoc-like syntax.  See the documentation on the C<TYPEMAP:> XS
154keyword.
155
156For CPAN distributions, you can assume that the XS types defined by
157the perl core are already available. Additionally, the core typemap
158has default XS types for a large number of C types.  For example, if
159you simply return a C<char *> from your XSUB, the core typemap will
160have this C type associated with the T_PV XS type.  That means your
161C string will be copied into the PV (pointer value) slot of a new scalar
162that will be returned from your XSUB to Perl.
163
164If you're developing a CPAN distribution using XS, you may add your own
165file called F<typemap> to the distribution.  That file may contain
166typemaps that either map types that are specific to your code or that
167override the core typemap file's mappings for common C types.
168
169=head2 Sharing typemaps Between CPAN Distributions
170
171Starting with ExtUtils::ParseXS version 3.13_01 (comes with perl 5.16
172and better), it is rather easy to share typemap code between multiple
173CPAN distributions. The general idea is to share it as a module that
174offers a certain API and have the dependent modules declare that as a
175built-time requirement and import the typemap into the XS. An example
176of such a typemap-sharing module on CPAN is
177C<ExtUtils::Typemaps::Basic>. Two steps to getting that module's
178typemaps available in your code:
179
180=over 4
181
182=item *
183
184Declare C<ExtUtils::Typemaps::Basic> as a build-time dependency
185in C<Makefile.PL> (use C<BUILD_REQUIRES>), or in your C<Build.PL>
186(use C<build_requires>).
187
188=item *
189
190Include the following line in the XS section of your XS file:
191(don't break the line)
192
193  INCLUDE_COMMAND: $^X -MExtUtils::Typemaps::Cmd
194                   -e "print embeddable_typemap(q{Basic})"
195
196=back
197
198=head2 Writing typemap Entries
199
200Each INPUT or OUTPUT typemap entry is a double-quoted Perl string that
201will be evaluated in the presence of certain variables to get the
202final C code for mapping a certain C type.
203
204This means that you can embed Perl code in your typemap (C) code using
205constructs such as
206C<${ perl code that evaluates to scalar reference here }>. A common
207use case is to generate error messages that refer to the true function
208name even when using the ALIAS XS feature:
209
210  ${ $ALIAS ? \q[GvNAME(CvGV(cv))] : \qq[\"$pname\"] }
211
212For many typemap examples, refer to the core typemap file that can be
213found in the perl source tree at F<lib/ExtUtils/typemap>.
214
215The Perl variables that are available for interpolation into typemaps
216are the following:
217
218=over 4
219
220=item *
221
222I<$var> - the name of the input or output variable, eg. RETVAL for
223return values.
224
225=item *
226
227I<$type> - the raw C type of the parameter, any C<:> replaced with
228C<_>.
229e.g. for a type of C<Foo::Bar>, I<$type> is C<Foo__Bar>
230
231=item *
232
233I<$ntype> - the supplied type with C<*> replaced with C<Ptr>.
234e.g. for a type of C<Foo*>, I<$ntype> is C<FooPtr>
235
236=item *
237
238I<$arg> - the stack entry, that the parameter is input from or output
239to, e.g. C<ST(0)>
240
241=item *
242
243I<$argoff> - the argument stack offset of the argument.  ie. 0 for the
244first argument, etc.
245
246=item *
247
248I<$pname> - the full name of the XSUB, with including the C<PACKAGE>
249name, with any C<PREFIX> stripped.  This is the non-ALIAS name.
250
251=item *
252
253I<$Package> - the package specified by the most recent C<PACKAGE>
254keyword.
255
256=item *
257
258I<$ALIAS> - non-zero if the current XSUB has any aliases declared with
259C<ALIAS>.
260
261=back
262
263=head2 Full Listing of Core Typemaps
264
265Each C type is represented by an entry in the typemap file that
266is responsible for converting perl variables (SV, AV, HV, CV, etc.)
267to and from that type. The following sections list all XS types
268that come with perl by default.
269
270=over 4
271
272=item T_SV
273
274This simply passes the C representation of the Perl variable (an SV*)
275in and out of the XS layer. This can be used if the C code wants
276to deal directly with the Perl variable.
277
278=item T_SVREF
279
280Used to pass in and return a reference to an SV.
281
282Note that this typemap does not decrement the reference count
283when returning the reference to an SV*.
284See also: T_SVREF_REFCOUNT_FIXED
285
286=item T_SVREF_FIXED
287
288Used to pass in and return a reference to an SV.
289This is a fixed
290variant of T_SVREF that decrements the refcount appropriately
291when returning a reference to an SV*. Introduced in perl 5.15.4.
292
293=item T_AVREF
294
295From the perl level this is a reference to a perl array.
296From the C level this is a pointer to an AV.
297
298Note that this typemap does not decrement the reference count
299when returning an AV*. See also: T_AVREF_REFCOUNT_FIXED
300
301=item T_AVREF_REFCOUNT_FIXED
302
303From the perl level this is a reference to a perl array.
304From the C level this is a pointer to an AV. This is a fixed
305variant of T_AVREF that decrements the refcount appropriately
306when returning an AV*. Introduced in perl 5.15.4.
307
308=item T_HVREF
309
310From the perl level this is a reference to a perl hash.
311From the C level this is a pointer to an HV.
312
313Note that this typemap does not decrement the reference count
314when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED
315
316=item T_HVREF_REFCOUNT_FIXED
317
318From the perl level this is a reference to a perl hash.
319From the C level this is a pointer to an HV. This is a fixed
320variant of T_HVREF that decrements the refcount appropriately
321when returning an HV*. Introduced in perl 5.15.4.
322
323=item T_CVREF
324
325From the perl level this is a reference to a perl subroutine
326(e.g. $sub = sub { 1 };). From the C level this is a pointer
327to a CV.
328
329Note that this typemap does not decrement the reference count
330when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED
331
332=item T_CVREF_REFCOUNT_FIXED
333
334From the perl level this is a reference to a perl subroutine
335(e.g. $sub = sub { 1 };). From the C level this is a pointer
336to a CV.
337
338This is a fixed
339variant of T_HVREF that decrements the refcount appropriately
340when returning an HV*. Introduced in perl 5.15.4.
341
342=item T_SYSRET
343
344The T_SYSRET typemap is used to process return values from system calls.
345It is only meaningful when passing values from C to perl (there is
346no concept of passing a system return value from Perl to C).
347
348System calls return -1 on error (setting ERRNO with the reason)
349and (usually) 0 on success. If the return value is -1 this typemap
350returns C<undef>. If the return value is not -1, this typemap
351translates a 0 (perl false) to "0 but true" (which
352is perl true) or returns the value itself, to indicate that the
353command succeeded.
354
355The L<POSIX|POSIX> module makes extensive use of this type.
356
357=item T_UV
358
359An unsigned integer.
360
361=item T_IV
362
363A signed integer. This is cast to the required integer type when
364passed to C and converted to an IV when passed back to Perl.
365
366=item T_INT
367
368A signed integer. This typemap converts the Perl value to a native
369integer type (the C<int> type on the current platform). When returning
370the value to perl it is processed in the same way as for T_IV.
371
372Its behaviour is identical to using an C<int> type in XS with T_IV.
373
374=item T_ENUM
375
376An enum value. Used to transfer an enum component
377from C. There is no reason to pass an enum value to C since
378it is stored as an IV inside perl.
379
380=item T_BOOL
381
382A boolean type. This can be used to pass true and false values to and
383from C.
384
385=item T_U_INT
386
387This is for unsigned integers. It is equivalent to using T_UV
388but explicitly casts the variable to type C<unsigned int>.
389The default type for C<unsigned int> is T_UV.
390
391=item T_SHORT
392
393Short integers. This is equivalent to T_IV but explicitly casts
394the return to type C<short>. The default typemap for C<short>
395is T_IV.
396
397=item T_U_SHORT
398
399Unsigned short integers. This is equivalent to T_UV but explicitly
400casts the return to type C<unsigned short>. The default typemap for
401C<unsigned short> is T_UV.
402
403T_U_SHORT is used for type C<U16> in the standard typemap.
404
405=item T_LONG
406
407Long integers. This is equivalent to T_IV but explicitly casts
408the return to type C<long>. The default typemap for C<long>
409is T_IV.
410
411=item T_U_LONG
412
413Unsigned long integers. This is equivalent to T_UV but explicitly
414casts the return to type C<unsigned long>. The default typemap for
415C<unsigned long> is T_UV.
416
417T_U_LONG is used for type C<U32> in the standard typemap.
418
419=item T_CHAR
420
421Single 8-bit characters.
422
423=item T_U_CHAR
424
425An unsigned byte.
426
427=item T_FLOAT
428
429A floating point number. This typemap guarantees to return a variable
430cast to a C<float>.
431
432=item T_NV
433
434A Perl floating point number. Similar to T_IV and T_UV in that the
435return type is cast to the requested numeric type rather than
436to a specific type.
437
438=item T_DOUBLE
439
440A double precision floating point number. This typemap guarantees to
441return a variable cast to a C<double>.
442
443=item T_PV
444
445A string (char *).
446
447=item T_PTR
448
449A memory address (pointer). Typically associated with a C<void *>
450type.
451
452=item T_PTRREF
453
454Similar to T_PTR except that the pointer is stored in a scalar and the
455reference to that scalar is returned to the caller. This can be used
456to hide the actual pointer value from the programmer since it is usually
457not required directly from within perl.
458
459The typemap checks that a scalar reference is passed from perl to XS.
460
461=item T_PTROBJ
462
463Similar to T_PTRREF except that the reference is blessed into a class.
464This allows the pointer to be used as an object. Most commonly used to
465deal with C structs. The typemap checks that the perl object passed
466into the XS routine is of the correct class (or part of a subclass).
467
468The pointer is blessed into a class that is derived from the name
469of type of the pointer but with all '*' in the name replaced with
470'Ptr'.
471
472For C<DESTROY> XSUBs only, a T_PTROBJ is optimized to a T_PTRREF. This means
473the class check is skipped.
474
475=item T_REF_IV_REF
476
477NOT YET
478
479=item T_REF_IV_PTR
480
481Similar to T_PTROBJ in that the pointer is blessed into a scalar object.
482The difference is that when the object is passed back into XS it must be
483of the correct type (inheritance is not supported) while T_PTROBJ supports
484inheritance.
485
486The pointer is blessed into a class that is derived from the name
487of type of the pointer but with all '*' in the name replaced with
488'Ptr'.
489
490For C<DESTROY> XSUBs only, a T_REF_IV_PTR is optimized to a T_PTRREF. This
491means the class check is skipped.
492
493=item T_PTRDESC
494
495NOT YET
496
497=item T_REFREF
498
499Similar to T_PTRREF, except the pointer stored in the referenced scalar
500is dereferenced and copied to the output variable. This means that
501T_REFREF is to T_PTRREF as T_OPAQUE is to T_OPAQUEPTR. All clear?
502
503Only the INPUT part of this is implemented (Perl to XSUB) and there
504are no known users in core or on CPAN.
505
506=item T_REFOBJ
507
508Like T_REFREF, except it does strict type checking (inheritance is not
509supported).
510
511For C<DESTROY> XSUBs only, a T_REFOBJ is optimized to a T_REFREF. This means
512the class check is skipped.
513
514=item T_OPAQUEPTR
515
516This can be used to store bytes in the string component of the
517SV. Here the representation of the data is irrelevant to perl and the
518bytes themselves are just stored in the SV. It is assumed that the C
519variable is a pointer (the bytes are copied from that memory
520location).  If the pointer is pointing to something that is
521represented by 8 bytes then those 8 bytes are stored in the SV (and
522length() will report a value of 8). This entry is similar to T_OPAQUE.
523
524In principle the unpack() command can be used to convert the bytes
525back to a number (if the underlying type is known to be a number).
526
527This entry can be used to store a C structure (the number
528of bytes to be copied is calculated using the C C<sizeof> function)
529and can be used as an alternative to T_PTRREF without having to worry
530about a memory leak (since Perl will clean up the SV).
531
532=item T_OPAQUE
533
534This can be used to store data from non-pointer types in the string
535part of an SV. It is similar to T_OPAQUEPTR except that the
536typemap retrieves the pointer directly rather than assuming it
537is being supplied. For example, if an integer is imported into
538Perl using T_OPAQUE rather than T_IV the underlying bytes representing
539the integer will be stored in the SV but the actual integer value will
540not be available. i.e. The data is opaque to perl.
541
542The data may be retrieved using the C<unpack> function if the
543underlying type of the byte stream is known.
544
545T_OPAQUE supports input and output of simple types.
546T_OPAQUEPTR can be used to pass these bytes back into C if a pointer
547is acceptable.
548
549=item Implicit array
550
551xsubpp supports a special syntax for returning
552packed C arrays to perl. If the XS return type is given as
553
554  array(type, nelem)
555
556xsubpp will copy the contents of C<nelem * sizeof(type)> bytes from
557RETVAL to an SV and push it onto the stack. This is only really useful
558if the number of items to be returned is known at compile time and you
559don't mind having a string of bytes in your SV.  Use T_ARRAY to push a
560variable number of arguments onto the return stack (they won't be
561packed as a single string though).
562
563This is similar to using T_OPAQUEPTR but can be used to process more
564than one element.
565
566=item T_PACKED
567
568Calls user-supplied functions for conversion. For C<OUTPUT>
569(XSUB to Perl), a function named C<XS_pack_$ntype> is called
570with the output Perl scalar and the C variable to convert from.
571C<$ntype> is the normalized C type that is to be mapped to
572Perl. Normalized means that all C<*> are replaced by the
573string C<Ptr>. The return value of the function is ignored.
574
575Conversely for C<INPUT> (Perl to XSUB) mapping, the
576function named C<XS_unpack_$ntype> is called with the input Perl
577scalar as argument and the return value is cast to the mapped
578C type and assigned to the output C variable.
579
580An example conversion function for a typemapped struct
581C<foo_t *> might be:
582
583  static void
584  XS_pack_foo_tPtr(SV *out, foo_t *in)
585  {
586    dTHX; /* alas, signature does not include pTHX_ */
587    HV* hash = newHV();
588    hv_stores(hash, "int_member", newSViv(in->int_member));
589    hv_stores(hash, "float_member", newSVnv(in->float_member));
590    /* ... */
591
592    /* mortalize as thy stack is not refcounted */
593    sv_setsv(out, sv_2mortal(newRV_noinc((SV*)hash)));
594  }
595
596The conversion from Perl to C is left as an exercise to the reader,
597but the prototype would be:
598
599  static foo_t *
600  XS_unpack_foo_tPtr(SV *in);
601
602Instead of an actual C function that has to fetch the thread context
603using C<dTHX>, you can define macros of the same name and avoid the
604overhead. Also, keep in mind to possibly free the memory allocated by
605C<XS_unpack_foo_tPtr>.
606
607=item T_PACKEDARRAY
608
609T_PACKEDARRAY is similar to T_PACKED. In fact, the C<INPUT> (Perl
610to XSUB) typemap is identical, but the C<OUTPUT> typemap passes
611an additional argument to the C<XS_pack_$ntype> function. This
612third parameter indicates the number of elements in the output
613so that the function can handle C arrays sanely. The variable
614needs to be declared by the user and must have the name
615C<count_$ntype> where C<$ntype> is the normalized C type name
616as explained above. The signature of the function would be for
617the example above and C<foo_t **>:
618
619  static void
620  XS_pack_foo_tPtrPtr(SV *out, foo_t *in, UV count_foo_tPtrPtr);
621
622The type of the third parameter is arbitrary as far as the typemap
623is concerned. It just has to be in line with the declared variable.
624
625Of course, unless you know the number of elements in the
626C<sometype **> C array, within your XSUB, the return value from
627C<foo_t ** XS_unpack_foo_tPtrPtr(...)> will be hard to decipher.
628Since the details are all up to the XS author (the typemap user),
629there are several solutions, none of which particularly elegant.
630The most commonly seen solution has been to allocate memory for
631N+1 pointers and assign C<NULL> to the (N+1)th to facilitate
632iteration.
633
634Alternatively, using a customized typemap for your purposes in
635the first place is probably preferable.
636
637=item T_DATAUNIT
638
639NOT YET
640
641=item T_CALLBACK
642
643NOT YET
644
645=item T_ARRAY
646
647This is used to convert the perl argument list to a C array
648and for pushing the contents of a C array onto the perl
649argument stack.
650
651The usual calling signature is
652
653  @out = array_func( @in );
654
655Any number of arguments can occur in the list before the array but
656the input and output arrays must be the last elements in the list.
657
658When used to pass a perl list to C the XS writer must provide a
659function (named after the array type but with 'Ptr' substituted for
660'*') to allocate the memory required to hold the list. A pointer
661should be returned. It is up to the XS writer to free the memory on
662exit from the function. The variable C<ix_$var> is set to the number
663of elements in the new array.
664
665When returning a C array to Perl the XS writer must provide an integer
666variable called C<size_$var> containing the number of elements in the
667array. This is used to determine how many elements should be pushed
668onto the return argument stack. This is not required on input since
669Perl knows how many arguments are on the stack when the routine is
670called. Ordinarily this variable would be called C<size_RETVAL>.
671
672Additionally, the type of each element is determined from the type of
673the array. If the array uses type C<intArray *> xsubpp will
674automatically work out that it contains variables of type C<int> and
675use that typemap entry to perform the copy of each element. All
676pointer '*' and 'Array' tags are removed from the name to determine
677the subtype.
678
679=item T_STDIO
680
681This is used for passing perl filehandles to and from C using
682C<FILE *> structures.
683
684=item T_INOUT
685
686This is used for passing perl filehandles to and from C using
687C<PerlIO *> structures. The file handle can used for reading and
688writing. This corresponds to the C<+E<lt>> mode, see also T_IN
689and T_OUT.
690
691See L<perliol> for more information on the Perl IO abstraction
692layer. Perl must have been built with C<-Duseperlio>.
693
694There is no check to assert that the filehandle passed from Perl
695to C was created with the right C<open()> mode.
696
697Hint: The L<perlxstut> tutorial covers the T_INOUT, T_IN, and T_OUT
698XS types nicely.
699
700=item T_IN
701
702Same as T_INOUT, but the filehandle that is returned from C to Perl
703can only be used for reading (mode C<E<lt>>).
704
705=item T_OUT
706
707Same as T_INOUT, but the filehandle that is returned from C to Perl
708is set to use the open mode C<+E<gt>>.
709
710=back
711
712