xref: /netbsd-src/external/cddl/osnet/dist/lib/libctf/common/ctf.5 (revision ff1e46a1cbb4ba883bed8ac351f78b71ff2c619d)
1.\" $NetBSD: ctf.5,v 1.3 2015/09/29 06:33:01 wiz Exp $
2.\"
3.\" This file and its contents are supplied under the terms of the
4.\" Common Development and Distribution License ("CDDL"), version 1.0.
5.\" You may only use this file in accordance with the terms of version
6.\" 1.0 of the CDDL.
7.\"
8.\" A full copy of the text of the CDDL should have accompanied this
9.\" source.  A copy of the CDDL is also available via the Internet at
10.\" http://www.illumos.org/license/CDDL.
11.\"
12.\"
13.\" Copyright (c) 2014 Joyent, Inc.
14.\"
15.Dd September 26, 2014
16.Dt CTF 5
17.Os
18.Sh NAME
19.Nm ctf
20.Nd Compact C Type Format
21.Sh SYNOPSIS
22.In sys/ctf.h
23.Sh DESCRIPTION
24.Nm
25is designed to be a compact representation of the C programming
26language's type information focused on serving the needs of dynamic
27tracing, debuggers, and other in-situ and post-mortem introspection
28tools.
29.Nm
30data is generally included in
31.Sy ELF
32objects and is tagged as
33.Sy SHT_PROGBITS
34to ensure that the data is accessible in a running process and in subsequent
35core dumps, if generated.
36.Lp
37The
38.Nm
39data contained in each file has information about the layout and
40sizes of C types, including intrinsic types, enumerations, structures,
41typedefs, and unions, that are used by the corresponding
42.Sy ELF
43object.
44The
45.Nm
46data may also include information about the types of global objects and
47the return type and arguments of functions in the symbol table.
48.Lp
49Because a
50.Nm
51file is often embedded inside a file, rather than being a standalone
52file itself, it may also be referred to as a
53.Nm
54.Sy container .
55.Lp
56On illumos systems,
57.Nm
58data is consumed by multiple programs.
59It can be used by
60.\" the modular
61.\" debugger,
62.\" .Xr mdb 1 ,
63.\" as well as by
64.Xr dtrace 1 .
65Programmatic access to
66.Nm
67data can be obtained through
68.Xr libctf 3 .
69.Lp
70The
71.Nm
72file format is broken down into seven different sections.
73The first
74section is the
75.Sy preamble
76and
77.Sy header ,
78which describes the version of the
79.Nm
80file, links it has to other
81.Nm
82files, and the sizes of the other sections.
83The next section is the
84.Sy label
85section,
86which provides a way of identifying similar groups of
87.Nm
88data across multiple files.
89This is followed by the
90.Sy object
91information section, which describes the type of global
92symbols.
93The subsequent section is the
94.Sy function
95information section, which describes the return
96types and arguments of functions.
97The next section is the
98.Sy type
99information section, which describes
100the format and layout of the C types themselves, and finally the last
101section is the
102.Sy string
103section, which contains the names of types, enumerations, members, and
104labels.
105.Lp
106While strictly speaking, only the
107.Sy preamble
108and
109.Sy header
110are required, to be actually useful, both the type and string
111sections are necessary.
112.Lp
113A
114.Nm
115file may contain all of the type information that it requires, or it
116may optionally refer to another
117.Nm
118file which holds the remaining types.
119When a
120.Nm
121file refers to another file, it is called the
122.Sy child
123and the file it refers to is called the
124.Sy parent .
125A given file may only refer to one parent.
126This process is called
127.Em uniquification
128because it ensures each child only has type information that is
129unique to it.
130A common example of this is that most kernel modules in
131illumos are uniquified against the kernel module
132.Sy genunix
133and the type information that comes from the
134.Sy IP
135module.
136This means that a module only has types that are unique to
137itself and the most common types in the kernel are not duplicated.
138.Sh FILE FORMAT
139This documents version
140.Em two
141of the
142.Nm
143file format.
144All applications and tools currently produce and operate on
145this version.
146.Lp
147The file format can be summarized with the following image, the
148following sections will cover this in more detail.
149.Bd -literal
150
151         +-------------+  0t0
152+--------| Preamble    |
153|        +-------------+  0t4
154|+-------| Header      |
155||       +-------------+  0t36 + cth_lbloff
156||+------| Labels      |
157|||      +-------------+  0t36 + cth_objtoff
158|||+-----| Objects     |
159||||     +-------------+  0t36 + cth_funcoff
160||||+----| Functions   |
161|||||    +-------------+  0t36 + cth_typeoff
162|||||+---| Types       |
163||||||   +-------------+  0t36 + cth_stroff
164||||||+--| Strings     |
165|||||||  +-------------+  0t36 + cth_stroff + cth_strlen
166|||||||
167|||||||
168|||||||
169|||||||    +-- magic -   vers   flags
170|||||||    |          |    |      |
171|||||||   +------+------+------+------+
172+---------| 0xcf | 0xf1 | 0x02 | 0x00 |
173 ||||||   +------+------+------+------+
174 ||||||   0      1      2      3      4
175 ||||||
176 ||||||    + parent label        + objects
177 ||||||    |       + parent name |     + functions    + strings
178 ||||||    |       |     + label |     |      + types |       + strlen
179 ||||||    |       |     |       |     |      |       |       |
180 ||||||   +------+------+------+------+------+-------+-------+-------+
181 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 |
182  |||||   +------+------+------+------+------+-------+-------+-------+
183  |||||   0x04   0x08   0x0c   0x10   0x14    0x18    0x1c    0x20   0x24
184  |||||
185  |||||         + Label name
186  |||||         |       + Label type
187  |||||         |       |       + Next label
188  |||||         |       |       |
189  |||||       +-------+------+-----+
190  +-----------| 0x01  | 0x42 | ... |
191   ||||       +-------+------+-----+
192   ||||  cth_lbloff   +0x4   +0x8  cth_objtoff
193   ||||
194   ||||
195   |||| Symidx  0t15   0t43   0t44
196   ||||       +------+------+------+-----+
197   +----------| 0x00 | 0x42 | 0x36 | ... |
198    |||       +------+------+------+-----+
199    ||| cth_objtoff  +0x2   +0x4   +0x6   cth_funcoff
200    |||
201    |||        + CTF_TYPE_INFO         + CTF_TYPE_INFO
202    |||        |        + Return type  |
203    |||        |        |       + arg0 |
204    |||       +--------+------+------+-----+
205    +---------| 0x2c10 | 0x08 | 0x0c | ... |
206     ||       +--------+------+------+-----+
207     || cth_funcff     +0x2   +0x4   +0x6  cth_typeoff
208     ||
209     ||         + ctf_stype_t for type 1
210     ||         |  integer           + integer encoding
211     ||         |                    |          + ctf_stype_t for type 2
212     ||         |                    |          |
213     ||       +--------------------+-----------+-----+
214     +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... |
215      |       +--------------------+-----------+-----+
216      | cth_typeoff               +0x08      +0x0c  cth_stroff
217      |
218      |     +--- str 0
219      |     |    +--- str 1       + str 2
220      |     |    |                |
221      |     v    v                v
222      |   +----+---+---+---+----+---+---+---+---+---+----+
223      +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 |
224          +----+---+---+---+----+---+---+---+---+---+----+
225          0    1   2   3   4    5   6   7   8   9   10   11
226.Ed
227.Lp
228Every
229.Nm
230file begins with a
231.Sy preamble ,
232followed by a
233.Sy header .
234The
235.Sy preamble
236is defined as follows:
237.Bd -literal
238typedef struct ctf_preamble {
239	ushort_t ctp_magic;	/* magic number (CTF_MAGIC) */
240	uchar_t ctp_version;	/* data format version number (CTF_VERSION) */
241	uchar_t ctp_flags;	/* flags (see below) */
242} ctf_preamble_t;
243.Ed
244.Pp
245The
246.Sy preamble
247is four bytes long and must be four byte aligned.
248This
249.Sy preamble
250defines the version of the
251.Nm
252file which defines the format of the rest of the header.
253While the
254header may change in subsequent versions, the preamble will not change
255across versions, though the interpretation of its flags may change from
256version to version.
257The
258.Em ctp_magic
259member defines the magic number for the
260.Nm
261file format.
262This must always be
263.Li 0xcff1 .
264If another value is encountered, then the file should not be treated as
265a
266.Nm
267file.
268The
269.Em ctp_version
270member defines the version of the
271.Nm
272file.
273The current version is
274.Li 2 .
275It is possible to encounter an unsupported version.
276In that case,
277software should not try to parse the format, as it may have changed.
278Finally, the
279.Em ctp_flags
280member describes aspects of the file which modify its interpretation.
281The following flags are currently defined:
282.Bd -literal
283#define	CTF_F_COMPRESS		0x01
284.Ed
285.Pp
286The flag
287.Sy CTF_F_COMPRESS
288indicates that the body of the
289.Nm
290file, all the data following the
291.Sy header ,
292has been compressed through the
293.Sy zlib
294library and its
295.Sy deflate
296algorithm.
297If this flag is not present, then the body has not been
298compressed and no special action is needed to interpret it.
299All offsets
300into the data as described by
301.Sy header ,
302always refer to the
303.Sy uncompressed
304data.
305.Lp
306In version two of the
307.Nm
308file format, the
309.Sy header
310denotes whether whether or not this
311.Nm
312file is the child of another
313.Nm
314file and also indicates the size of the remaining sections.
315The
316structure for the
317.Sy header ,
318logically contains a copy of the
319.Sy preamble
320and the two have a combined size of 36 bytes.
321.Bd -literal
322typedef struct ctf_header {
323	ctf_preamble_t cth_preamble;
324	uint_t cth_parlabel;	/* ref to name of parent lbl uniq'd against */
325	uint_t cth_parname;	/* ref to basename of parent */
326	uint_t cth_lbloff;	/* offset of label section */
327	uint_t cth_objtoff;	/* offset of object section */
328	uint_t cth_funcoff;	/* offset of function section */
329	uint_t cth_typeoff;	/* offset of type section */
330	uint_t cth_stroff;	/* offset of string section */
331	uint_t cth_strlen;	/* length of string section in bytes */
332} ctf_header_t;
333.Ed
334.Pp
335After the
336.Sy preamble ,
337the next two members
338.Em cth_parlablel
339and
340.Em cth_parname ,
341are used to identify the parent.
342The value of both members are offsets
343into the
344.Sy string
345section which point to the start of a null-terminated string.
346For more
347information on the encoding of strings, see the subsection on
348.Sx String Identifiers .
349If the value of either is zero, then there is no entry for that
350member.
351If the member
352.Em cth_parlabel
353is set, then the
354.Em ctf_parname
355member must be set, otherwise it will not be possible to find the
356parent.
357If
358.Em ctf_parname
359is set, it is not necessary to define
360.Em cth_parlabel ,
361as the parent may not have a label.
362For more information on labels
363and their interpretation, see
364.Sx The Label Section .
365.Lp
366The remaining members (excepting
367.Em cth_strlen )
368describe the beginning of the corresponding sections.
369These offsets are
370relative to the end of the
371.Sy header .
372Therefore, something with an offset of 0 is at an offset of thirty-six
373bytes relative to the start of the
374.Nm
375file.
376The difference between members
377indicates the size of the section itself.
378Different offsets have
379different alignment requirements.
380The start of the
381.Em cth_objotoff
382and
383.Em cth_funcoff
384must be two byte aligned, while the sections
385.Em cth_lbloff
386and
387.Em cth_typeoff
388must be four-byte aligned.
389The section
390.Em cth_stroff
391has no alignment requirements.
392To calculate the size of a given section,
393excepting the
394.Sy string
395section, one should subtract the offset of the section from the following one.
396For
397example, the size of the
398.Sy types
399section can be calculated by subtracting
400.Em cth_stroff
401from
402.Em cth_typeoff .
403.Lp
404Finally, the member
405.Em cth_strlen
406describes the length of the string section itself.
407From it, you can also
408calculate the size of the entire
409.Nm
410file by adding together the size of the
411.Sy ctf_header_t ,
412the offset of the string section in
413.Em cth_stroff ,
414and the size of the string section in
415.Em cth_srlen .
416.Ss Type Identifiers
417Through the
418.Nm ctf
419data, types are referred to by identifiers.
420A given
421.Nm
422file supports up to 32767 (0x7fff) types.
423The first valid type identifier is 0x1.
424When a given
425.Nm
426file is a child, indicated by a non-zero entry for the
427.Sy header Ns 's
428.Em cth_parname ,
429then the first valid type identifier is 0x8000 and the last is 0xffff.
430In this case, type identifiers 0x1 through 0x7fff are references to the
431parent.
432.Lp
433The type identifier zero is a sentinel value used to indicate that there
434is no type information available or it is an unknown type.
435.Lp
436Throughout the file format, the identifier is stored in different sized
437values; however, the minimum size to represent a given identifier is a
438.Sy uint16_t .
439Other consumers of
440.Nm
441information may use larger or opaque identifiers.
442.Ss String Identifiers
443String identifiers are always encoded as four byte unsigned integers
444which are an offset into a string table.
445The
446.Nm
447format supports two different string tables which have an identifier of
448zero or one.
449This identifier is stored in the high-order bit of the
450unsigned four byte offset.
451Therefore, the maximum supported offset into
452one of these tables is 0x7ffffffff.
453.Lp
454Table identifier zero, always refers to the
455.Sy string
456section in the CTF file itself.
457String table identifier one refers to an
458external string table which is the ELF string table for the ELF symbol
459table associated with the
460.Nm
461container.
462.Ss Type Encoding
463Every
464.Nm
465type begins with metadata encoded into a
466.Sy uint16_t .
467This encoded information tells us three different pieces of information:
468.Bl -bullet -offset indent -compact
469.It
470The kind of the type
471.It
472Whether this type is a root type or not
473.It
474The length of the variable data
475.El
476.Lp
477The 16 bits that make up the encoding are broken down such that you have
478five bits for the kind, one bit for indicating whether or not it is a
479root type, and 10 bits for the variable length.
480This is laid out as
481follows:
482.Bd -literal -offset indent
483+--------------------+
484| kind | root | vlen |
485+--------------------+
48615   11   10   9    0
487.Ed
488.Lp
489The current version of the file format defines 14 different kinds.
490The
491interpretation of these different kinds will be discussed in the section
492.Sx The Type Section .
493If a kind is encountered that is not listed below, then it is not a valid
494.Nm
495file.
496The kinds are defined as follows:
497.Bd -literal -offset indent
498#define	CTF_K_UNKNOWN	0
499#define	CTF_K_INTEGER	1
500#define	CTF_K_FLOAT	2
501#define	CTF_K_POINTER	3
502#define	CTF_K_ARRAY	4
503#define	CTF_K_FUNCTION	5
504#define	CTF_K_STRUCT	6
505#define	CTF_K_UNION	7
506#define	CTF_K_ENUM	8
507#define	CTF_K_FORWARD	9
508#define	CTF_K_TYPEDEF	10
509#define	CTF_K_VOLATILE	11
510#define	CTF_K_CONST	12
511#define	CTF_K_RESTRICT	13
512.Ed
513.Lp
514Programs directly reference many types; however, other types are referenced
515indirectly because they are part of some other structure.
516These types that are
517referenced directly and used are called
518.Sy root
519types.
520Other types may be used indirectly, for example, a program may reference
521a structure directly, but not one of its members which has a type.
522That type is
523not considered a
524.Sy root
525type.
526If a type is a
527.Sy root
528type, then it will have bit 10 set.
529.Lp
530The variable length section is specific to each kind and is discussed in the
531section
532.Sx The Type Section .
533.Lp
534The following macros are useful for constructing and deconstructing the encoded
535type information:
536.Bd -literal -offset indent
537
538#define	CTF_MAX_VLEN	0x3ff
539#define	CTF_INFO_KIND(info)	(((info) & 0xf800) >> 11)
540#define	CTF_INFO_ISROOT(info)	(((info) & 0x0400) >> 10)
541#define	CTF_INFO_VLEN(info)	(((info) & CTF_MAX_VLEN))
542
543#define	CTF_TYPE_INFO(kind, isroot, vlen) \\
544	(((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN))
545.Ed
546.Ss The Label Section
547When consuming
548.Nm
549data, it is often useful to know whether two different
550.Nm
551containers come from the same source base and version.
552For example, when
553building illumos, there are many kernel modules that are built against a
554single collection of source code.
555A label is encoded into the
556.Nm
557files that corresponds with the particular build.
558This ensures that if
559files on the system were to become mixed up from multiple releases, that
560they are not used together by tools, particularly when a child needs to
561refer to a type in the parent.
562Because they are linked used the type
563identifiers, if the wrong parent is used then the wrong type will be
564encountered.
565.Lp
566Each label is encoded in the file format using the following eight byte
567structure:
568.Bd -literal
569typedef struct ctf_lblent {
570	uint_t ctl_label;	/* ref to name of label */
571	uint_t ctl_typeidx;	/* last type associated with this label */
572} ctf_lblent_t;
573.Ed
574.Lp
575Each label has two different components, a name and a type identifier.
576The name is encoded in the
577.Em ctl_label
578member which is in the format defined in the section
579.Sx String Identifiers .
580Generally, the names of all labels are found in the internal string
581section.
582.Lp
583The type identifier encoded in the member
584.Em ctl_typeidx
585refers to the last type identifier that a label refers to in the current
586file.
587Labels only refer to types in the current file, if the
588.Nm
589file is a child, then it will have the same label as its parent;
590however, its label will only refer to its types, not its parents.
591.Lp
592It is also possible, though rather uncommon, for a
593.Nm
594file to have multiple labels.
595Labels are placed one after another, every
596eight bytes.
597When multiple labels are present, types may only belong to
598a single label.
599.Ss The Object Section
600The object section provides a mapping from ELF symbols of type
601.Sy STT_OBJECT
602in the symbol table to a type identifier.
603Every entry in this section is
604a
605.Sy uint16_t
606which contains a type identifier as described in the section
607.Sx Type Identifiers .
608If there is no information for an object, then the type identifier 0x0
609is stored for that entry.
610.Lp
611To walk the object section, you need to have a corresponding
612.Sy symbol table
613in the ELF object that contains the
614.Nm
615data.
616Not every object is included in this section.
617Specifically, when
618walking the symbol table.
619An entry is skipped if it matches any of the
620following conditions:
621.Lp
622.Bl -bullet -offset indent -compact
623.It
624The symbol type is not
625.Sy STT_OBJECT
626.It
627The symbol's section index is
628.Sy SHN_UNDEF
629.It
630The symbol's name offset is zero
631.It
632The symbol's section index is
633.Sy SHN_ABS
634and the value of the symbol is zero.
635.It
636The symbol's name is
637.Li _START_
638or
639.Li _END_ .
640These are skipped because they are used for scoping local symbols in
641ELF.
642.El
643.Lp
644The following sample code shows an example of iterating the object
645section and skipping the correct symbols:
646.Bd -literal
647#include <gelf.h>
648#include <stdio.h>
649
650/*
651 * Given the start of the object section in the CTF file, the number of symbols,
652 * and the ELF Data sections for the symbol table and the string table, this
653 * prints the type identifiers that correspond to objects. Note, a more robust
654 * implementation should ensure that they don't walk beyond the end of the CTF
655 * object section.
656 */
657static int
658walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata,
659    long nsyms)
660{
661	long i;
662	uintptr_t strbase = strdata->d_buf;
663
664	for (i = 1; i < nsyms; i++, objftoff++) {
665		const char *name;
666		GElf_Sym sym;
667
668		if (gelf_getsym(symdata, i, &sym) == NULL)
669			return (1);
670
671		if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
672			continue;
673		if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0)
674			continue;
675		if (sym.st_shndx == SHN_ABS && sym.st_value == 0)
676			continue;
677		name = (const char *)(strbase + sym.st_name);
678		if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0)
679			continue;
680
681		(void) printf("Symbol %d has type %d\n", i, *objtoff);
682	}
683
684	return (0);
685}
686.Ed
687.Ss The Function Section
688The function section of the
689.Nm
690file encodes the types of both the function's arguments and the function's
691return type.
692Similar to
693.Sx The Object Section ,
694the function section encodes information for all symbols of type
695.Sy STT_FUNCTION ,
696excepting those that fit specific criteria.
697Unlike with objects, because
698functions have a variable number of arguments, they start with a type encoding
699as defined in
700.Sx Type Encoding ,
701which is the size of a
702.Sy uint16_t .
703For functions which have no type information available, they are encoded as
704.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) .
705Functions with arguments are encoded differently.
706Here, the variable length is
707turned into the number of arguments in the function.
708If a function is a
709.Sy varargs
710type function, then the number of arguments is increased by one.
711Functions with
712type information are encoded as:
713.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) .
714.Lp
715For functions that have no type information, nothing else is encoded, and the
716next function is encoded.
717For functions with type information, the next
718.Sy uint16_t
719is encoded with the type identifier of the return type of the function.
720It is
721followed by each of the type identifiers of the arguments, if any exist, in the
722order that they appear in the function.
723Therefore, argument 0 is the first type
724identifier and so on.
725When a function has a final varargs argument, that is
726encoded with the type identifier of zero.
727.Lp
728Like
729.Sx The Object Section ,
730the function section is encoded in the order of the symbol table.
731It has
732similar, but slightly different considerations from objects.
733While iterating the
734symbol table, if any of the following conditions are true, then the entry is
735skipped and no corresponding entry is written:
736.Lp
737.Bl -bullet -offset indent -compact
738.It
739The symbol type is not
740.Sy STT_FUNCTION
741.It
742The symbol's section index is
743.Sy SHN_UNDEF
744.It
745The symbol's name offset is zero
746.It
747The symbol's name is
748.Li _START_
749or
750.Li _END_ .
751These are skipped because they are used for scoping local symbols in
752ELF.
753.El
754.Ss The Type Section
755The type section is the heart of the
756.Nm
757data.
758It encodes all of the information about the types themselves.
759The base of
760the type information comes in two forms, a short form and a long form, each of
761which may be followed by a variable number of arguments.
762The following
763definitions describe the short and long forms:
764.Bd -literal
765#define	CTF_MAX_SIZE	0xfffe	/* max size of a type in bytes */
766#define	CTF_LSIZE_SENT	0xffff	/* sentinel for ctt_size */
767#define	CTF_MAX_LSIZE	UINT64_MAX
768
769typedef struct ctf_stype {
770	uint_t ctt_name;	/* reference to name in string table */
771	ushort_t ctt_info;	/* encoded kind, variant length */
772	union {
773		ushort_t _size;	/* size of entire type in bytes */
774		ushort_t _type;	/* reference to another type */
775	} _u;
776} ctf_stype_t;
777
778typedef struct ctf_type {
779	uint_t ctt_name;	/* reference to name in string table */
780	ushort_t ctt_info;	/* encoded kind, variant length */
781	union {
782		ushort_t _size;	/* always CTF_LSIZE_SENT */
783		ushort_t _type; /* do not use */
784	} _u;
785	uint_t ctt_lsizehi;	/* high 32 bits of type size in bytes */
786	uint_t ctt_lsizelo;	/* low 32 bits of type size in bytes */
787} ctf_type_t;
788
789#define	ctt_size _u._size	/* for fundamental types that have a size */
790#define	ctt_type _u._type	/* for types that reference another type */
791.Ed
792.Pp
793Type sizes are stored in
794.Sy bytes .
795The basic small form uses a
796.Sy ushort_t
797to store the number of bytes.
798If the number of bytes in a structure would exceed
7990xfffe, then the alternate form, the
800.Sy ctf_type_t ,
801is used instead.
802To indicate that the larger form is being used, the member
803.Em ctt_size
804is set to value of
805.Sy CTF_LSIZE_SENT
806(0xffff).
807In general, when going through the type section, consumers use the
808.Sy ctf_type_t
809structure, but pay attention to the value of the member
810.Em ctt_size
811to determine whether they should increment their scan by the size of the
812.Sy ctf_stype_t
813or
814.Sy ctf_type_t .
815Not all kinds of types use
816.Sy ctt_size .
817Those which do not, will always use the
818.Sy ctf_stype_t
819structure.
820The individual sections for each kind have more information.
821.Lp
822Types are written out in order.
823Therefore the first entry encountered has a type
824id of 0x1, or 0x8000 if a child.
825The member
826.Em ctt_name
827is encoded as described in the section
828.Sx String Identifiers .
829The string that it points to is the name of the type.
830If the identifier points
831to an empty string (one that consists solely of a null terminator) then the type
832does not have a name, this is common with anonymous structures and unions that
833only have a typedef to name them, as well as, pointers and qualifiers.
834.Lp
835The next member, the
836.Em ctt_info ,
837is encoded as described in the section
838.Sx Type Encoding .
839The types kind tells us how to interpret the remaining data in the
840.Sy ctf_type_t
841and any variable length data that may exist.
842The rest of this section will be
843broken down into the interpretation of the various kinds.
844.Ss Encoding of Integers
845Integers, which are of type
846.Sy CTF_K_INTEGER ,
847have no variable length arguments.
848Instead, they are followed by a four byte
849.Sy uint_t
850which describes their encoding.
851All integers must be encoded with a variable
852length of zero.
853The
854.Em ctt_size
855member describes the length of the integer in bytes.
856In general, integer sizes
857will be rounded up to the closest power of two.
858.Lp
859The integer encoding contains three different pieces of information:
860.Bl -bullet -offset indent -compact
861.It
862The encoding of the integer
863.It
864The offset in
865.Sy bits
866of the type
867.It
868The size in
869.Sy bits
870of the type
871.El
872.Pp
873This encoding can be expressed through the following macros:
874.Bd -literal -offset indent
875#define	CTF_INT_ENCODING(data)	(((data) & 0xff000000) >> 24)
876#define	CTF_INT_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
877#define	CTF_INT_BITS(data)	(((data) & 0x0000ffff))
878
879#define	CTF_INT_DATA(encoding, offset, bits) \\
880	(((encoding) << 24) | ((offset) << 16) | (bits))
881.Ed
882.Pp
883The following flags are defined for the encoding at this time:
884.Bd -literal -offset indent
885#define	CTF_INT_SIGNED		0x01
886#define	CTF_INT_CHAR		0x02
887#define	CTF_INT_BOOL		0x04
888#define	CTF_INT_VARARGS		0x08
889.Ed
890.Lp
891By default, an integer is considered to be unsigned, unless it has the
892.Sy CTF_INT_SIGNED
893flag set.
894If the flag
895.Sy CTF_INT_CHAR
896is set, that indicates that the integer is of a type that stores character
897data, for example the intrinsic C type
898.Sy char
899would have the
900.Sy CTF_INT_CHAR
901flag set.
902If the flag
903.Sy CTF_INT_BOOL
904is set, that indicates that the integer represents a boolean type.
905For example,
906the intrinsic C type
907.Sy _Bool
908would have the
909.Sy CTF_INT_BOOL
910flag set.
911Finally, the flag
912.Sy CTF_INT_VARARGS
913indicates that the integer is used as part of a variable number of arguments.
914This encoding is rather uncommon.
915.Ss Encoding of Floats
916Floats, which are of type
917.Sy CTF_K_FLOAT ,
918are similar to their integer counterparts.
919They have no variable length
920arguments and are followed by a four byte encoding which describes the kind of
921float that exists.
922The
923.Em ctt_size
924member is the size, in bytes, of the float.
925The float encoding has three
926different pieces of information inside of it:
927.Lp
928.Bl -bullet -offset indent -compact
929.It
930The specific kind of float that exists
931.It
932The offset in
933.Sy bits
934of the float
935.It
936The size in
937.Sy bits
938of the float
939.El
940.Lp
941This encoding can be expressed through the following macros:
942.Bd -literal -offset indent
943#define	CTF_FP_ENCODING(data)	(((data) & 0xff000000) >> 24)
944#define	CTF_FP_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
945#define	CTF_FP_BITS(data)	(((data) & 0x0000ffff))
946
947#define	CTF_FP_DATA(encoding, offset, bits) \\
948	(((encoding) << 24) | ((offset) << 16) | (bits))
949.Ed
950.Lp
951Where as the encoding for integers was a series of flags, the encoding for
952floats maps to a specific kind of float.
953It is not a flag-based value.
954The kinds of floats
955correspond to both their size, and the encoding.
956This covers all of the basic C
957intrinsic floating point types.
958The following are the different kinds of floats
959represented in the encoding:
960.Bd -literal -offset indent
961#define	CTF_FP_SINGLE	1	/* IEEE 32-bit float encoding */
962#define	CTF_FP_DOUBLE	2	/* IEEE 64-bit float encoding */
963#define	CTF_FP_CPLX	3	/* Complex encoding */
964#define	CTF_FP_DCPLX	4	/* Double complex encoding */
965#define	CTF_FP_LDCPLX	5	/* Long double complex encoding */
966#define	CTF_FP_LDOUBLE	6	/* Long double encoding */
967#define	CTF_FP_INTRVL	7	/* Interval (2x32-bit) encoding */
968#define	CTF_FP_DINTRVL	8	/* Double interval (2x64-bit) encoding */
969#define	CTF_FP_LDINTRVL	9	/* Long double interval (2x128-bit) encoding */
970#define	CTF_FP_IMAGRY	10	/* Imaginary (32-bit) encoding */
971#define	CTF_FP_DIMAGRY	11	/* Long imaginary (64-bit) encoding */
972#define	CTF_FP_LDIMAGRY	12	/* Long double imaginary (128-bit) encoding */
973.Ed
974.Ss Encoding of Arrays
975Arrays, which are of type
976.Sy CTF_K_ARRAY ,
977have no variable length arguments.
978They are followed by a structure which
979describes the number of elements in the array, the type identifier of the
980elements in the array, and the type identifier of the index of the array.
981With
982arrays, the
983.Em ctt_size
984member is set to zero.
985The structure that follows an array is defined as:
986.Bd -literal
987typedef struct ctf_array {
988	ushort_t cta_contents;	/* reference to type of array contents */
989	ushort_t cta_index;	/* reference to type of array index */
990	uint_t cta_nelems;	/* number of elements */
991} ctf_array_t;
992.Ed
993.Lp
994The
995.Em cta_contents
996and
997.Em cta_index
998members of the
999.Sy ctf_array_t
1000are type identifiers which are encoded as per the section
1001.Sx Type Identifiers .
1002The member
1003.Em cta_nelems
1004is a simple four byte unsigned count of the number of elements.
1005This count may
1006be zero when encountering C99's flexible array members.
1007.Ss Encoding of Functions
1008Function types, which are of type
1009.Sy CTF_K_FUNCTION ,
1010use the variable length list to be the number of arguments in the function.
1011When
1012the function has a final member which is a varargs, then the argument count is
1013incremented by one to account for the variable argument.
1014Here, the
1015.Em ctt_type
1016member is encoded with the type identifier of the return type of the function.
1017Note that the
1018.Em ctt_size
1019member is not used here.
1020.Lp
1021The variable argument list contains the type identifiers for the arguments of
1022the function, if any.
1023Each one is represented by a
1024.Sy uint16_t
1025and encoded according to the
1026.Sx Type Identifiers
1027section.
1028If the function's last argument is of type varargs, then it is also
1029written out, but the type identifier is zero.
1030This is included in the count of
1031the function's arguments.
1032.Ss Encoding of Structures and Unions
1033Structures and Unions, which are encoded with
1034.Sy CTF_K_STRUCT
1035and
1036.Sy CTF_K_UNION
1037respectively,  are very similar constructs in C.
1038The main difference
1039between them is the fact that every member of a structure follows one another,
1040where as in a union, all members share the same memory.
1041They are also very
1042similar in terms of their encoding in
1043.Nm .
1044The variable length argument for structures and unions represents the number of
1045members that they have.
1046The value of the member
1047.Em ctt_size
1048is the size of the structure and union.
1049There are two different structures which
1050are used to encode members in the variable list.
1051When the size of a structure or
1052union is greater than or equal to the large member threshold, 8192, then a
1053different structure is used to encode the member, all members are encoded using
1054the same structure.
1055The structure for members is as follows:
1056.Bd -literal
1057typedef struct ctf_member {
1058	uint_t ctm_name;	/* reference to name in string table */
1059	ushort_t ctm_type;	/* reference to type of member */
1060	ushort_t ctm_offset;	/* offset of this member in bits */
1061} ctf_member_t;
1062
1063typedef struct ctf_lmember {
1064	uint_t ctlm_name;	/* reference to name in string table */
1065	ushort_t ctlm_type;	/* reference to type of member */
1066	ushort_t ctlm_pad;	/* padding */
1067	uint_t ctlm_offsethi;	/* high 32 bits of member offset in bits */
1068	uint_t ctlm_offsetlo;	/* low 32 bits of member offset in bits */
1069} ctf_lmember_t;
1070.Ed
1071.Lp
1072Both the
1073.Em ctm_name
1074and
1075.Em ctlm_name
1076refer to the name of the member.
1077The name is encoded as an offset into the
1078string table as described by the section
1079.Sx String Identifiers .
1080The members
1081.Sy ctm_type
1082and
1083.Sy ctlm_type
1084both refer to the type of the member.
1085They are encoded as per the section
1086.Sx Type Identifiers .
1087.Lp
1088The last piece of information that is present is the offset which describes the
1089offset in memory that the member begins at.
1090For unions, this value will always
1091be zero because the start of unions in memory is always zero.
1092For structures,
1093this is the offset in
1094.Sy bits
1095that the member begins at.
1096Note that a compiler may lay out a type with padding.
1097This means that the difference in offset between two consecutive members may be
1098larger than the size of the member.
1099When the size of the overall structure is
1100strictly less than 8192 bytes, the normal structure,
1101.Sy ctf_member_t ,
1102is used and the offset in bits is stored in the member
1103.Em ctm_offset .
1104However, when the size of the structure is greater than or equal to 8192 bytes,
1105then the number of bits is split into two 32-bit quantities.
1106One member,
1107.Em ctlm_offsethi ,
1108represents the upper 32 bits of the offset, while the other member,
1109.Em ctlm_offsetlo ,
1110represents the lower 32 bits of the offset.
1111These can be joined together to get
1112a 64-bit sized offset in bits by shifting the member
1113.Em ctlm_offsethi
1114to the left by thirty two and then doing a binary or of
1115.Em ctlm_offsetlo .
1116.Ss Encoding of Enumerations
1117Enumerations, noted by the type
1118.Sy CTF_K_ENUM ,
1119are similar to structures.
1120Enumerations use the variable list to note the number
1121of values that the enumeration contains, which we'll term enumerators.
1122In C, an
1123enumeration is always equivalent to the intrinsic type
1124.Sy int ,
1125thus the value of the member
1126.Em ctt_size
1127is always the size of an integer which is determined based on the current model.
1128For illumos systems, this will always be 4, as an integer is always defined to
1129be 4 bytes large in both
1130.Sy ILP32
1131and
1132.Sy LP64 ,
1133regardless of the architecture.
1134.Lp
1135The enumerators encoded in an enumeration have the following structure in the
1136variable list:
1137.Bd -literal
1138typedef struct ctf_enum {
1139	uint_t cte_name;	/* reference to name in string table */
1140	int cte_value;		/* value associated with this name */
1141} ctf_enum_t;
1142.Ed
1143.Pp
1144The member
1145.Em cte_name
1146refers to the name of the enumerator's value, it is encoded according to the
1147rules in the section
1148.Sx String Identifiers .
1149The member
1150.Em cte_value
1151contains the integer value of this enumerator.
1152.Ss Encoding of Forward References
1153Forward references, types of kind
1154.Sy CTF_K_FORWARD ,
1155in a
1156.Nm
1157file refer to types which may not have a definition at all, only a name.
1158If
1159the
1160.Nm
1161file is a child, then it may be that the forward is resolved to an
1162actual type in the parent, otherwise the definition may be in another
1163.Nm
1164container or may not be known at all.
1165The only member of the
1166.Sy ctf_type_t
1167that matters for a forward declaration is the
1168.Em ctt_name
1169which points to the name of the forward reference in the string table as
1170described earlier.
1171There is no other information recorded for forward
1172references.
1173.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict
1174Pointers, typedefs, volatile, const, and restrict are all similar in
1175.Nm .
1176They all refer to another type.
1177In the case of typedefs, they provide an
1178alternate name, while volatile, const, and restrict change how the type is
1179interpreted in the C programming language.
1180This covers the
1181.Nm
1182kinds
1183.Sy CTF_K_POINTER ,
1184.Sy CTF_K_TYPEDEF ,
1185.Sy CTF_K_VOLATILE ,
1186.Sy CTF_K_RESTRICT ,
1187and
1188.Sy CTF_K_CONST .
1189.Lp
1190These types have no variable list entries and use the member
1191.Em ctt_type
1192to refer to the base type that they modify.
1193.Ss Encoding of Unknown Types
1194Types with the kind
1195.Sy CTF_K_UNKNOWN
1196are used to indicate gaps in the type identifier space.
1197These entries consume an
1198identifier, but do not define anything.
1199Nothing should refer to these gap
1200identifiers.
1201.Ss Dependencies Between Types
1202C types can be imagined as a directed, cyclic, graph.
1203Structures and unions may
1204refer to each other in a way that creates a cyclic dependency.
1205In cases such as
1206these, the entire type section must be read in and processed.
1207Consumers must
1208not assume that every type can be laid out in dependency order; they
1209cannot.
1210.Ss The String Section
1211The last section of the
1212.Nm
1213file is the
1214.Sy string
1215section.
1216This section encodes all of the strings that appear throughout
1217the other sections.
1218It is laid out as a series of characters followed by
1219a null terminator.
1220Generally, all names are written out in ASCII, as
1221most C compilers do not allow and characters to appear in identifiers
1222outside of a subset of ASCII.
1223However, any extended characters sets
1224should be written out as a series of UTF-8 bytes.
1225.Lp
1226The first entry in the section, at offset zero, is a single null
1227terminator to reference the empty string.
1228Following that, each C string
1229should be written out, including the null terminator.
1230Offsets that refer
1231to something in this section should refer to the first byte which begins
1232a string.
1233Beyond the first byte in the section being the null
1234terminator, the order of strings is unimportant.
1235.Ss Data Encoding and ELF Considerations
1236.Nm
1237data is generally included in ELF objects which specify information to
1238identify the architecture and endianness of the file.
1239A
1240.Nm
1241container inside such an object must match the endianness of the ELF
1242object.
1243Aside from the question of the endian encoding of data, there
1244should be no other differences between architectures.
1245While many of the
1246types in this document refer to non-fixed size C integral types, they
1247are equivalent in the models
1248.Sy ILP32
1249and
1250.Sy LP64 .
1251If any other model is being used with
1252.Nm
1253data that has different sizes, then it must not use the model's sizes for
1254those integral types and instead use the fixed size equivalents based on an
1255.Sy ILP32
1256environment.
1257.Lp
1258When placing a
1259.Nm
1260container inside of an ELF object, there are certain conventions that are
1261expected for the purposes of tooling being able to find the
1262.Nm
1263data.
1264In particular, a given ELF object should only contain a single
1265.Nm
1266section.
1267Multiple containers should be merged together into a single
1268one.
1269.Lp
1270The
1271.Nm
1272file should be included in its own ELF section.
1273The section's name
1274must be
1275.Ql .SUNW_ctf .
1276The type of the section must be
1277.Sy SHT_PROGBITS .
1278The section should have a link set to the symbol table and its address
1279alignment must be 4.
1280.Sh SEE ALSO
1281.Xr dtrace 1 ,
1282.Xr elf 3 ,
1283.Xr gelf 3 ,
1284.Xr a.out 5 ,
1285.Xr elf 5
1286