1.\" $NetBSD: ctf.5,v 1.3 2015/09/29 06:33:01 wiz Exp $ 2.\" 3.\" This file and its contents are supplied under the terms of the 4.\" Common Development and Distribution License ("CDDL"), version 1.0. 5.\" You may only use this file in accordance with the terms of version 6.\" 1.0 of the CDDL. 7.\" 8.\" A full copy of the text of the CDDL should have accompanied this 9.\" source. A copy of the CDDL is also available via the Internet at 10.\" http://www.illumos.org/license/CDDL. 11.\" 12.\" 13.\" Copyright (c) 2014 Joyent, Inc. 14.\" 15.Dd September 26, 2014 16.Dt CTF 5 17.Os 18.Sh NAME 19.Nm ctf 20.Nd Compact C Type Format 21.Sh SYNOPSIS 22.In sys/ctf.h 23.Sh DESCRIPTION 24.Nm 25is designed to be a compact representation of the C programming 26language's type information focused on serving the needs of dynamic 27tracing, debuggers, and other in-situ and post-mortem introspection 28tools. 29.Nm 30data is generally included in 31.Sy ELF 32objects and is tagged as 33.Sy SHT_PROGBITS 34to ensure that the data is accessible in a running process and in subsequent 35core dumps, if generated. 36.Lp 37The 38.Nm 39data contained in each file has information about the layout and 40sizes of C types, including intrinsic types, enumerations, structures, 41typedefs, and unions, that are used by the corresponding 42.Sy ELF 43object. 44The 45.Nm 46data may also include information about the types of global objects and 47the return type and arguments of functions in the symbol table. 48.Lp 49Because a 50.Nm 51file is often embedded inside a file, rather than being a standalone 52file itself, it may also be referred to as a 53.Nm 54.Sy container . 55.Lp 56On illumos systems, 57.Nm 58data is consumed by multiple programs. 59It can be used by 60.\" the modular 61.\" debugger, 62.\" .Xr mdb 1 , 63.\" as well as by 64.Xr dtrace 1 . 65Programmatic access to 66.Nm 67data can be obtained through 68.Xr libctf 3 . 69.Lp 70The 71.Nm 72file format is broken down into seven different sections. 73The first 74section is the 75.Sy preamble 76and 77.Sy header , 78which describes the version of the 79.Nm 80file, links it has to other 81.Nm 82files, and the sizes of the other sections. 83The next section is the 84.Sy label 85section, 86which provides a way of identifying similar groups of 87.Nm 88data across multiple files. 89This is followed by the 90.Sy object 91information section, which describes the type of global 92symbols. 93The subsequent section is the 94.Sy function 95information section, which describes the return 96types and arguments of functions. 97The next section is the 98.Sy type 99information section, which describes 100the format and layout of the C types themselves, and finally the last 101section is the 102.Sy string 103section, which contains the names of types, enumerations, members, and 104labels. 105.Lp 106While strictly speaking, only the 107.Sy preamble 108and 109.Sy header 110are required, to be actually useful, both the type and string 111sections are necessary. 112.Lp 113A 114.Nm 115file may contain all of the type information that it requires, or it 116may optionally refer to another 117.Nm 118file which holds the remaining types. 119When a 120.Nm 121file refers to another file, it is called the 122.Sy child 123and the file it refers to is called the 124.Sy parent . 125A given file may only refer to one parent. 126This process is called 127.Em uniquification 128because it ensures each child only has type information that is 129unique to it. 130A common example of this is that most kernel modules in 131illumos are uniquified against the kernel module 132.Sy genunix 133and the type information that comes from the 134.Sy IP 135module. 136This means that a module only has types that are unique to 137itself and the most common types in the kernel are not duplicated. 138.Sh FILE FORMAT 139This documents version 140.Em two 141of the 142.Nm 143file format. 144All applications and tools currently produce and operate on 145this version. 146.Lp 147The file format can be summarized with the following image, the 148following sections will cover this in more detail. 149.Bd -literal 150 151 +-------------+ 0t0 152+--------| Preamble | 153| +-------------+ 0t4 154|+-------| Header | 155|| +-------------+ 0t36 + cth_lbloff 156||+------| Labels | 157||| +-------------+ 0t36 + cth_objtoff 158|||+-----| Objects | 159|||| +-------------+ 0t36 + cth_funcoff 160||||+----| Functions | 161||||| +-------------+ 0t36 + cth_typeoff 162|||||+---| Types | 163|||||| +-------------+ 0t36 + cth_stroff 164||||||+--| Strings | 165||||||| +-------------+ 0t36 + cth_stroff + cth_strlen 166||||||| 167||||||| 168||||||| 169||||||| +-- magic - vers flags 170||||||| | | | | 171||||||| +------+------+------+------+ 172+---------| 0xcf | 0xf1 | 0x02 | 0x00 | 173 |||||| +------+------+------+------+ 174 |||||| 0 1 2 3 4 175 |||||| 176 |||||| + parent label + objects 177 |||||| | + parent name | + functions + strings 178 |||||| | | + label | | + types | + strlen 179 |||||| | | | | | | | | 180 |||||| +------+------+------+------+------+-------+-------+-------+ 181 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 | 182 ||||| +------+------+------+------+------+-------+-------+-------+ 183 ||||| 0x04 0x08 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24 184 ||||| 185 ||||| + Label name 186 ||||| | + Label type 187 ||||| | | + Next label 188 ||||| | | | 189 ||||| +-------+------+-----+ 190 +-----------| 0x01 | 0x42 | ... | 191 |||| +-------+------+-----+ 192 |||| cth_lbloff +0x4 +0x8 cth_objtoff 193 |||| 194 |||| 195 |||| Symidx 0t15 0t43 0t44 196 |||| +------+------+------+-----+ 197 +----------| 0x00 | 0x42 | 0x36 | ... | 198 ||| +------+------+------+-----+ 199 ||| cth_objtoff +0x2 +0x4 +0x6 cth_funcoff 200 ||| 201 ||| + CTF_TYPE_INFO + CTF_TYPE_INFO 202 ||| | + Return type | 203 ||| | | + arg0 | 204 ||| +--------+------+------+-----+ 205 +---------| 0x2c10 | 0x08 | 0x0c | ... | 206 || +--------+------+------+-----+ 207 || cth_funcff +0x2 +0x4 +0x6 cth_typeoff 208 || 209 || + ctf_stype_t for type 1 210 || | integer + integer encoding 211 || | | + ctf_stype_t for type 2 212 || | | | 213 || +--------------------+-----------+-----+ 214 +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... | 215 | +--------------------+-----------+-----+ 216 | cth_typeoff +0x08 +0x0c cth_stroff 217 | 218 | +--- str 0 219 | | +--- str 1 + str 2 220 | | | | 221 | v v v 222 | +----+---+---+---+----+---+---+---+---+---+----+ 223 +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 | 224 +----+---+---+---+----+---+---+---+---+---+----+ 225 0 1 2 3 4 5 6 7 8 9 10 11 226.Ed 227.Lp 228Every 229.Nm 230file begins with a 231.Sy preamble , 232followed by a 233.Sy header . 234The 235.Sy preamble 236is defined as follows: 237.Bd -literal 238typedef struct ctf_preamble { 239 ushort_t ctp_magic; /* magic number (CTF_MAGIC) */ 240 uchar_t ctp_version; /* data format version number (CTF_VERSION) */ 241 uchar_t ctp_flags; /* flags (see below) */ 242} ctf_preamble_t; 243.Ed 244.Pp 245The 246.Sy preamble 247is four bytes long and must be four byte aligned. 248This 249.Sy preamble 250defines the version of the 251.Nm 252file which defines the format of the rest of the header. 253While the 254header may change in subsequent versions, the preamble will not change 255across versions, though the interpretation of its flags may change from 256version to version. 257The 258.Em ctp_magic 259member defines the magic number for the 260.Nm 261file format. 262This must always be 263.Li 0xcff1 . 264If another value is encountered, then the file should not be treated as 265a 266.Nm 267file. 268The 269.Em ctp_version 270member defines the version of the 271.Nm 272file. 273The current version is 274.Li 2 . 275It is possible to encounter an unsupported version. 276In that case, 277software should not try to parse the format, as it may have changed. 278Finally, the 279.Em ctp_flags 280member describes aspects of the file which modify its interpretation. 281The following flags are currently defined: 282.Bd -literal 283#define CTF_F_COMPRESS 0x01 284.Ed 285.Pp 286The flag 287.Sy CTF_F_COMPRESS 288indicates that the body of the 289.Nm 290file, all the data following the 291.Sy header , 292has been compressed through the 293.Sy zlib 294library and its 295.Sy deflate 296algorithm. 297If this flag is not present, then the body has not been 298compressed and no special action is needed to interpret it. 299All offsets 300into the data as described by 301.Sy header , 302always refer to the 303.Sy uncompressed 304data. 305.Lp 306In version two of the 307.Nm 308file format, the 309.Sy header 310denotes whether whether or not this 311.Nm 312file is the child of another 313.Nm 314file and also indicates the size of the remaining sections. 315The 316structure for the 317.Sy header , 318logically contains a copy of the 319.Sy preamble 320and the two have a combined size of 36 bytes. 321.Bd -literal 322typedef struct ctf_header { 323 ctf_preamble_t cth_preamble; 324 uint_t cth_parlabel; /* ref to name of parent lbl uniq'd against */ 325 uint_t cth_parname; /* ref to basename of parent */ 326 uint_t cth_lbloff; /* offset of label section */ 327 uint_t cth_objtoff; /* offset of object section */ 328 uint_t cth_funcoff; /* offset of function section */ 329 uint_t cth_typeoff; /* offset of type section */ 330 uint_t cth_stroff; /* offset of string section */ 331 uint_t cth_strlen; /* length of string section in bytes */ 332} ctf_header_t; 333.Ed 334.Pp 335After the 336.Sy preamble , 337the next two members 338.Em cth_parlablel 339and 340.Em cth_parname , 341are used to identify the parent. 342The value of both members are offsets 343into the 344.Sy string 345section which point to the start of a null-terminated string. 346For more 347information on the encoding of strings, see the subsection on 348.Sx String Identifiers . 349If the value of either is zero, then there is no entry for that 350member. 351If the member 352.Em cth_parlabel 353is set, then the 354.Em ctf_parname 355member must be set, otherwise it will not be possible to find the 356parent. 357If 358.Em ctf_parname 359is set, it is not necessary to define 360.Em cth_parlabel , 361as the parent may not have a label. 362For more information on labels 363and their interpretation, see 364.Sx The Label Section . 365.Lp 366The remaining members (excepting 367.Em cth_strlen ) 368describe the beginning of the corresponding sections. 369These offsets are 370relative to the end of the 371.Sy header . 372Therefore, something with an offset of 0 is at an offset of thirty-six 373bytes relative to the start of the 374.Nm 375file. 376The difference between members 377indicates the size of the section itself. 378Different offsets have 379different alignment requirements. 380The start of the 381.Em cth_objotoff 382and 383.Em cth_funcoff 384must be two byte aligned, while the sections 385.Em cth_lbloff 386and 387.Em cth_typeoff 388must be four-byte aligned. 389The section 390.Em cth_stroff 391has no alignment requirements. 392To calculate the size of a given section, 393excepting the 394.Sy string 395section, one should subtract the offset of the section from the following one. 396For 397example, the size of the 398.Sy types 399section can be calculated by subtracting 400.Em cth_stroff 401from 402.Em cth_typeoff . 403.Lp 404Finally, the member 405.Em cth_strlen 406describes the length of the string section itself. 407From it, you can also 408calculate the size of the entire 409.Nm 410file by adding together the size of the 411.Sy ctf_header_t , 412the offset of the string section in 413.Em cth_stroff , 414and the size of the string section in 415.Em cth_srlen . 416.Ss Type Identifiers 417Through the 418.Nm ctf 419data, types are referred to by identifiers. 420A given 421.Nm 422file supports up to 32767 (0x7fff) types. 423The first valid type identifier is 0x1. 424When a given 425.Nm 426file is a child, indicated by a non-zero entry for the 427.Sy header Ns 's 428.Em cth_parname , 429then the first valid type identifier is 0x8000 and the last is 0xffff. 430In this case, type identifiers 0x1 through 0x7fff are references to the 431parent. 432.Lp 433The type identifier zero is a sentinel value used to indicate that there 434is no type information available or it is an unknown type. 435.Lp 436Throughout the file format, the identifier is stored in different sized 437values; however, the minimum size to represent a given identifier is a 438.Sy uint16_t . 439Other consumers of 440.Nm 441information may use larger or opaque identifiers. 442.Ss String Identifiers 443String identifiers are always encoded as four byte unsigned integers 444which are an offset into a string table. 445The 446.Nm 447format supports two different string tables which have an identifier of 448zero or one. 449This identifier is stored in the high-order bit of the 450unsigned four byte offset. 451Therefore, the maximum supported offset into 452one of these tables is 0x7ffffffff. 453.Lp 454Table identifier zero, always refers to the 455.Sy string 456section in the CTF file itself. 457String table identifier one refers to an 458external string table which is the ELF string table for the ELF symbol 459table associated with the 460.Nm 461container. 462.Ss Type Encoding 463Every 464.Nm 465type begins with metadata encoded into a 466.Sy uint16_t . 467This encoded information tells us three different pieces of information: 468.Bl -bullet -offset indent -compact 469.It 470The kind of the type 471.It 472Whether this type is a root type or not 473.It 474The length of the variable data 475.El 476.Lp 477The 16 bits that make up the encoding are broken down such that you have 478five bits for the kind, one bit for indicating whether or not it is a 479root type, and 10 bits for the variable length. 480This is laid out as 481follows: 482.Bd -literal -offset indent 483+--------------------+ 484| kind | root | vlen | 485+--------------------+ 48615 11 10 9 0 487.Ed 488.Lp 489The current version of the file format defines 14 different kinds. 490The 491interpretation of these different kinds will be discussed in the section 492.Sx The Type Section . 493If a kind is encountered that is not listed below, then it is not a valid 494.Nm 495file. 496The kinds are defined as follows: 497.Bd -literal -offset indent 498#define CTF_K_UNKNOWN 0 499#define CTF_K_INTEGER 1 500#define CTF_K_FLOAT 2 501#define CTF_K_POINTER 3 502#define CTF_K_ARRAY 4 503#define CTF_K_FUNCTION 5 504#define CTF_K_STRUCT 6 505#define CTF_K_UNION 7 506#define CTF_K_ENUM 8 507#define CTF_K_FORWARD 9 508#define CTF_K_TYPEDEF 10 509#define CTF_K_VOLATILE 11 510#define CTF_K_CONST 12 511#define CTF_K_RESTRICT 13 512.Ed 513.Lp 514Programs directly reference many types; however, other types are referenced 515indirectly because they are part of some other structure. 516These types that are 517referenced directly and used are called 518.Sy root 519types. 520Other types may be used indirectly, for example, a program may reference 521a structure directly, but not one of its members which has a type. 522That type is 523not considered a 524.Sy root 525type. 526If a type is a 527.Sy root 528type, then it will have bit 10 set. 529.Lp 530The variable length section is specific to each kind and is discussed in the 531section 532.Sx The Type Section . 533.Lp 534The following macros are useful for constructing and deconstructing the encoded 535type information: 536.Bd -literal -offset indent 537 538#define CTF_MAX_VLEN 0x3ff 539#define CTF_INFO_KIND(info) (((info) & 0xf800) >> 11) 540#define CTF_INFO_ISROOT(info) (((info) & 0x0400) >> 10) 541#define CTF_INFO_VLEN(info) (((info) & CTF_MAX_VLEN)) 542 543#define CTF_TYPE_INFO(kind, isroot, vlen) \\ 544 (((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN)) 545.Ed 546.Ss The Label Section 547When consuming 548.Nm 549data, it is often useful to know whether two different 550.Nm 551containers come from the same source base and version. 552For example, when 553building illumos, there are many kernel modules that are built against a 554single collection of source code. 555A label is encoded into the 556.Nm 557files that corresponds with the particular build. 558This ensures that if 559files on the system were to become mixed up from multiple releases, that 560they are not used together by tools, particularly when a child needs to 561refer to a type in the parent. 562Because they are linked used the type 563identifiers, if the wrong parent is used then the wrong type will be 564encountered. 565.Lp 566Each label is encoded in the file format using the following eight byte 567structure: 568.Bd -literal 569typedef struct ctf_lblent { 570 uint_t ctl_label; /* ref to name of label */ 571 uint_t ctl_typeidx; /* last type associated with this label */ 572} ctf_lblent_t; 573.Ed 574.Lp 575Each label has two different components, a name and a type identifier. 576The name is encoded in the 577.Em ctl_label 578member which is in the format defined in the section 579.Sx String Identifiers . 580Generally, the names of all labels are found in the internal string 581section. 582.Lp 583The type identifier encoded in the member 584.Em ctl_typeidx 585refers to the last type identifier that a label refers to in the current 586file. 587Labels only refer to types in the current file, if the 588.Nm 589file is a child, then it will have the same label as its parent; 590however, its label will only refer to its types, not its parents. 591.Lp 592It is also possible, though rather uncommon, for a 593.Nm 594file to have multiple labels. 595Labels are placed one after another, every 596eight bytes. 597When multiple labels are present, types may only belong to 598a single label. 599.Ss The Object Section 600The object section provides a mapping from ELF symbols of type 601.Sy STT_OBJECT 602in the symbol table to a type identifier. 603Every entry in this section is 604a 605.Sy uint16_t 606which contains a type identifier as described in the section 607.Sx Type Identifiers . 608If there is no information for an object, then the type identifier 0x0 609is stored for that entry. 610.Lp 611To walk the object section, you need to have a corresponding 612.Sy symbol table 613in the ELF object that contains the 614.Nm 615data. 616Not every object is included in this section. 617Specifically, when 618walking the symbol table. 619An entry is skipped if it matches any of the 620following conditions: 621.Lp 622.Bl -bullet -offset indent -compact 623.It 624The symbol type is not 625.Sy STT_OBJECT 626.It 627The symbol's section index is 628.Sy SHN_UNDEF 629.It 630The symbol's name offset is zero 631.It 632The symbol's section index is 633.Sy SHN_ABS 634and the value of the symbol is zero. 635.It 636The symbol's name is 637.Li _START_ 638or 639.Li _END_ . 640These are skipped because they are used for scoping local symbols in 641ELF. 642.El 643.Lp 644The following sample code shows an example of iterating the object 645section and skipping the correct symbols: 646.Bd -literal 647#include <gelf.h> 648#include <stdio.h> 649 650/* 651 * Given the start of the object section in the CTF file, the number of symbols, 652 * and the ELF Data sections for the symbol table and the string table, this 653 * prints the type identifiers that correspond to objects. Note, a more robust 654 * implementation should ensure that they don't walk beyond the end of the CTF 655 * object section. 656 */ 657static int 658walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata, 659 long nsyms) 660{ 661 long i; 662 uintptr_t strbase = strdata->d_buf; 663 664 for (i = 1; i < nsyms; i++, objftoff++) { 665 const char *name; 666 GElf_Sym sym; 667 668 if (gelf_getsym(symdata, i, &sym) == NULL) 669 return (1); 670 671 if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT) 672 continue; 673 if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0) 674 continue; 675 if (sym.st_shndx == SHN_ABS && sym.st_value == 0) 676 continue; 677 name = (const char *)(strbase + sym.st_name); 678 if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0) 679 continue; 680 681 (void) printf("Symbol %d has type %d\n", i, *objtoff); 682 } 683 684 return (0); 685} 686.Ed 687.Ss The Function Section 688The function section of the 689.Nm 690file encodes the types of both the function's arguments and the function's 691return type. 692Similar to 693.Sx The Object Section , 694the function section encodes information for all symbols of type 695.Sy STT_FUNCTION , 696excepting those that fit specific criteria. 697Unlike with objects, because 698functions have a variable number of arguments, they start with a type encoding 699as defined in 700.Sx Type Encoding , 701which is the size of a 702.Sy uint16_t . 703For functions which have no type information available, they are encoded as 704.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) . 705Functions with arguments are encoded differently. 706Here, the variable length is 707turned into the number of arguments in the function. 708If a function is a 709.Sy varargs 710type function, then the number of arguments is increased by one. 711Functions with 712type information are encoded as: 713.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) . 714.Lp 715For functions that have no type information, nothing else is encoded, and the 716next function is encoded. 717For functions with type information, the next 718.Sy uint16_t 719is encoded with the type identifier of the return type of the function. 720It is 721followed by each of the type identifiers of the arguments, if any exist, in the 722order that they appear in the function. 723Therefore, argument 0 is the first type 724identifier and so on. 725When a function has a final varargs argument, that is 726encoded with the type identifier of zero. 727.Lp 728Like 729.Sx The Object Section , 730the function section is encoded in the order of the symbol table. 731It has 732similar, but slightly different considerations from objects. 733While iterating the 734symbol table, if any of the following conditions are true, then the entry is 735skipped and no corresponding entry is written: 736.Lp 737.Bl -bullet -offset indent -compact 738.It 739The symbol type is not 740.Sy STT_FUNCTION 741.It 742The symbol's section index is 743.Sy SHN_UNDEF 744.It 745The symbol's name offset is zero 746.It 747The symbol's name is 748.Li _START_ 749or 750.Li _END_ . 751These are skipped because they are used for scoping local symbols in 752ELF. 753.El 754.Ss The Type Section 755The type section is the heart of the 756.Nm 757data. 758It encodes all of the information about the types themselves. 759The base of 760the type information comes in two forms, a short form and a long form, each of 761which may be followed by a variable number of arguments. 762The following 763definitions describe the short and long forms: 764.Bd -literal 765#define CTF_MAX_SIZE 0xfffe /* max size of a type in bytes */ 766#define CTF_LSIZE_SENT 0xffff /* sentinel for ctt_size */ 767#define CTF_MAX_LSIZE UINT64_MAX 768 769typedef struct ctf_stype { 770 uint_t ctt_name; /* reference to name in string table */ 771 ushort_t ctt_info; /* encoded kind, variant length */ 772 union { 773 ushort_t _size; /* size of entire type in bytes */ 774 ushort_t _type; /* reference to another type */ 775 } _u; 776} ctf_stype_t; 777 778typedef struct ctf_type { 779 uint_t ctt_name; /* reference to name in string table */ 780 ushort_t ctt_info; /* encoded kind, variant length */ 781 union { 782 ushort_t _size; /* always CTF_LSIZE_SENT */ 783 ushort_t _type; /* do not use */ 784 } _u; 785 uint_t ctt_lsizehi; /* high 32 bits of type size in bytes */ 786 uint_t ctt_lsizelo; /* low 32 bits of type size in bytes */ 787} ctf_type_t; 788 789#define ctt_size _u._size /* for fundamental types that have a size */ 790#define ctt_type _u._type /* for types that reference another type */ 791.Ed 792.Pp 793Type sizes are stored in 794.Sy bytes . 795The basic small form uses a 796.Sy ushort_t 797to store the number of bytes. 798If the number of bytes in a structure would exceed 7990xfffe, then the alternate form, the 800.Sy ctf_type_t , 801is used instead. 802To indicate that the larger form is being used, the member 803.Em ctt_size 804is set to value of 805.Sy CTF_LSIZE_SENT 806(0xffff). 807In general, when going through the type section, consumers use the 808.Sy ctf_type_t 809structure, but pay attention to the value of the member 810.Em ctt_size 811to determine whether they should increment their scan by the size of the 812.Sy ctf_stype_t 813or 814.Sy ctf_type_t . 815Not all kinds of types use 816.Sy ctt_size . 817Those which do not, will always use the 818.Sy ctf_stype_t 819structure. 820The individual sections for each kind have more information. 821.Lp 822Types are written out in order. 823Therefore the first entry encountered has a type 824id of 0x1, or 0x8000 if a child. 825The member 826.Em ctt_name 827is encoded as described in the section 828.Sx String Identifiers . 829The string that it points to is the name of the type. 830If the identifier points 831to an empty string (one that consists solely of a null terminator) then the type 832does not have a name, this is common with anonymous structures and unions that 833only have a typedef to name them, as well as, pointers and qualifiers. 834.Lp 835The next member, the 836.Em ctt_info , 837is encoded as described in the section 838.Sx Type Encoding . 839The types kind tells us how to interpret the remaining data in the 840.Sy ctf_type_t 841and any variable length data that may exist. 842The rest of this section will be 843broken down into the interpretation of the various kinds. 844.Ss Encoding of Integers 845Integers, which are of type 846.Sy CTF_K_INTEGER , 847have no variable length arguments. 848Instead, they are followed by a four byte 849.Sy uint_t 850which describes their encoding. 851All integers must be encoded with a variable 852length of zero. 853The 854.Em ctt_size 855member describes the length of the integer in bytes. 856In general, integer sizes 857will be rounded up to the closest power of two. 858.Lp 859The integer encoding contains three different pieces of information: 860.Bl -bullet -offset indent -compact 861.It 862The encoding of the integer 863.It 864The offset in 865.Sy bits 866of the type 867.It 868The size in 869.Sy bits 870of the type 871.El 872.Pp 873This encoding can be expressed through the following macros: 874.Bd -literal -offset indent 875#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24) 876#define CTF_INT_OFFSET(data) (((data) & 0x00ff0000) >> 16) 877#define CTF_INT_BITS(data) (((data) & 0x0000ffff)) 878 879#define CTF_INT_DATA(encoding, offset, bits) \\ 880 (((encoding) << 24) | ((offset) << 16) | (bits)) 881.Ed 882.Pp 883The following flags are defined for the encoding at this time: 884.Bd -literal -offset indent 885#define CTF_INT_SIGNED 0x01 886#define CTF_INT_CHAR 0x02 887#define CTF_INT_BOOL 0x04 888#define CTF_INT_VARARGS 0x08 889.Ed 890.Lp 891By default, an integer is considered to be unsigned, unless it has the 892.Sy CTF_INT_SIGNED 893flag set. 894If the flag 895.Sy CTF_INT_CHAR 896is set, that indicates that the integer is of a type that stores character 897data, for example the intrinsic C type 898.Sy char 899would have the 900.Sy CTF_INT_CHAR 901flag set. 902If the flag 903.Sy CTF_INT_BOOL 904is set, that indicates that the integer represents a boolean type. 905For example, 906the intrinsic C type 907.Sy _Bool 908would have the 909.Sy CTF_INT_BOOL 910flag set. 911Finally, the flag 912.Sy CTF_INT_VARARGS 913indicates that the integer is used as part of a variable number of arguments. 914This encoding is rather uncommon. 915.Ss Encoding of Floats 916Floats, which are of type 917.Sy CTF_K_FLOAT , 918are similar to their integer counterparts. 919They have no variable length 920arguments and are followed by a four byte encoding which describes the kind of 921float that exists. 922The 923.Em ctt_size 924member is the size, in bytes, of the float. 925The float encoding has three 926different pieces of information inside of it: 927.Lp 928.Bl -bullet -offset indent -compact 929.It 930The specific kind of float that exists 931.It 932The offset in 933.Sy bits 934of the float 935.It 936The size in 937.Sy bits 938of the float 939.El 940.Lp 941This encoding can be expressed through the following macros: 942.Bd -literal -offset indent 943#define CTF_FP_ENCODING(data) (((data) & 0xff000000) >> 24) 944#define CTF_FP_OFFSET(data) (((data) & 0x00ff0000) >> 16) 945#define CTF_FP_BITS(data) (((data) & 0x0000ffff)) 946 947#define CTF_FP_DATA(encoding, offset, bits) \\ 948 (((encoding) << 24) | ((offset) << 16) | (bits)) 949.Ed 950.Lp 951Where as the encoding for integers was a series of flags, the encoding for 952floats maps to a specific kind of float. 953It is not a flag-based value. 954The kinds of floats 955correspond to both their size, and the encoding. 956This covers all of the basic C 957intrinsic floating point types. 958The following are the different kinds of floats 959represented in the encoding: 960.Bd -literal -offset indent 961#define CTF_FP_SINGLE 1 /* IEEE 32-bit float encoding */ 962#define CTF_FP_DOUBLE 2 /* IEEE 64-bit float encoding */ 963#define CTF_FP_CPLX 3 /* Complex encoding */ 964#define CTF_FP_DCPLX 4 /* Double complex encoding */ 965#define CTF_FP_LDCPLX 5 /* Long double complex encoding */ 966#define CTF_FP_LDOUBLE 6 /* Long double encoding */ 967#define CTF_FP_INTRVL 7 /* Interval (2x32-bit) encoding */ 968#define CTF_FP_DINTRVL 8 /* Double interval (2x64-bit) encoding */ 969#define CTF_FP_LDINTRVL 9 /* Long double interval (2x128-bit) encoding */ 970#define CTF_FP_IMAGRY 10 /* Imaginary (32-bit) encoding */ 971#define CTF_FP_DIMAGRY 11 /* Long imaginary (64-bit) encoding */ 972#define CTF_FP_LDIMAGRY 12 /* Long double imaginary (128-bit) encoding */ 973.Ed 974.Ss Encoding of Arrays 975Arrays, which are of type 976.Sy CTF_K_ARRAY , 977have no variable length arguments. 978They are followed by a structure which 979describes the number of elements in the array, the type identifier of the 980elements in the array, and the type identifier of the index of the array. 981With 982arrays, the 983.Em ctt_size 984member is set to zero. 985The structure that follows an array is defined as: 986.Bd -literal 987typedef struct ctf_array { 988 ushort_t cta_contents; /* reference to type of array contents */ 989 ushort_t cta_index; /* reference to type of array index */ 990 uint_t cta_nelems; /* number of elements */ 991} ctf_array_t; 992.Ed 993.Lp 994The 995.Em cta_contents 996and 997.Em cta_index 998members of the 999.Sy ctf_array_t 1000are type identifiers which are encoded as per the section 1001.Sx Type Identifiers . 1002The member 1003.Em cta_nelems 1004is a simple four byte unsigned count of the number of elements. 1005This count may 1006be zero when encountering C99's flexible array members. 1007.Ss Encoding of Functions 1008Function types, which are of type 1009.Sy CTF_K_FUNCTION , 1010use the variable length list to be the number of arguments in the function. 1011When 1012the function has a final member which is a varargs, then the argument count is 1013incremented by one to account for the variable argument. 1014Here, the 1015.Em ctt_type 1016member is encoded with the type identifier of the return type of the function. 1017Note that the 1018.Em ctt_size 1019member is not used here. 1020.Lp 1021The variable argument list contains the type identifiers for the arguments of 1022the function, if any. 1023Each one is represented by a 1024.Sy uint16_t 1025and encoded according to the 1026.Sx Type Identifiers 1027section. 1028If the function's last argument is of type varargs, then it is also 1029written out, but the type identifier is zero. 1030This is included in the count of 1031the function's arguments. 1032.Ss Encoding of Structures and Unions 1033Structures and Unions, which are encoded with 1034.Sy CTF_K_STRUCT 1035and 1036.Sy CTF_K_UNION 1037respectively, are very similar constructs in C. 1038The main difference 1039between them is the fact that every member of a structure follows one another, 1040where as in a union, all members share the same memory. 1041They are also very 1042similar in terms of their encoding in 1043.Nm . 1044The variable length argument for structures and unions represents the number of 1045members that they have. 1046The value of the member 1047.Em ctt_size 1048is the size of the structure and union. 1049There are two different structures which 1050are used to encode members in the variable list. 1051When the size of a structure or 1052union is greater than or equal to the large member threshold, 8192, then a 1053different structure is used to encode the member, all members are encoded using 1054the same structure. 1055The structure for members is as follows: 1056.Bd -literal 1057typedef struct ctf_member { 1058 uint_t ctm_name; /* reference to name in string table */ 1059 ushort_t ctm_type; /* reference to type of member */ 1060 ushort_t ctm_offset; /* offset of this member in bits */ 1061} ctf_member_t; 1062 1063typedef struct ctf_lmember { 1064 uint_t ctlm_name; /* reference to name in string table */ 1065 ushort_t ctlm_type; /* reference to type of member */ 1066 ushort_t ctlm_pad; /* padding */ 1067 uint_t ctlm_offsethi; /* high 32 bits of member offset in bits */ 1068 uint_t ctlm_offsetlo; /* low 32 bits of member offset in bits */ 1069} ctf_lmember_t; 1070.Ed 1071.Lp 1072Both the 1073.Em ctm_name 1074and 1075.Em ctlm_name 1076refer to the name of the member. 1077The name is encoded as an offset into the 1078string table as described by the section 1079.Sx String Identifiers . 1080The members 1081.Sy ctm_type 1082and 1083.Sy ctlm_type 1084both refer to the type of the member. 1085They are encoded as per the section 1086.Sx Type Identifiers . 1087.Lp 1088The last piece of information that is present is the offset which describes the 1089offset in memory that the member begins at. 1090For unions, this value will always 1091be zero because the start of unions in memory is always zero. 1092For structures, 1093this is the offset in 1094.Sy bits 1095that the member begins at. 1096Note that a compiler may lay out a type with padding. 1097This means that the difference in offset between two consecutive members may be 1098larger than the size of the member. 1099When the size of the overall structure is 1100strictly less than 8192 bytes, the normal structure, 1101.Sy ctf_member_t , 1102is used and the offset in bits is stored in the member 1103.Em ctm_offset . 1104However, when the size of the structure is greater than or equal to 8192 bytes, 1105then the number of bits is split into two 32-bit quantities. 1106One member, 1107.Em ctlm_offsethi , 1108represents the upper 32 bits of the offset, while the other member, 1109.Em ctlm_offsetlo , 1110represents the lower 32 bits of the offset. 1111These can be joined together to get 1112a 64-bit sized offset in bits by shifting the member 1113.Em ctlm_offsethi 1114to the left by thirty two and then doing a binary or of 1115.Em ctlm_offsetlo . 1116.Ss Encoding of Enumerations 1117Enumerations, noted by the type 1118.Sy CTF_K_ENUM , 1119are similar to structures. 1120Enumerations use the variable list to note the number 1121of values that the enumeration contains, which we'll term enumerators. 1122In C, an 1123enumeration is always equivalent to the intrinsic type 1124.Sy int , 1125thus the value of the member 1126.Em ctt_size 1127is always the size of an integer which is determined based on the current model. 1128For illumos systems, this will always be 4, as an integer is always defined to 1129be 4 bytes large in both 1130.Sy ILP32 1131and 1132.Sy LP64 , 1133regardless of the architecture. 1134.Lp 1135The enumerators encoded in an enumeration have the following structure in the 1136variable list: 1137.Bd -literal 1138typedef struct ctf_enum { 1139 uint_t cte_name; /* reference to name in string table */ 1140 int cte_value; /* value associated with this name */ 1141} ctf_enum_t; 1142.Ed 1143.Pp 1144The member 1145.Em cte_name 1146refers to the name of the enumerator's value, it is encoded according to the 1147rules in the section 1148.Sx String Identifiers . 1149The member 1150.Em cte_value 1151contains the integer value of this enumerator. 1152.Ss Encoding of Forward References 1153Forward references, types of kind 1154.Sy CTF_K_FORWARD , 1155in a 1156.Nm 1157file refer to types which may not have a definition at all, only a name. 1158If 1159the 1160.Nm 1161file is a child, then it may be that the forward is resolved to an 1162actual type in the parent, otherwise the definition may be in another 1163.Nm 1164container or may not be known at all. 1165The only member of the 1166.Sy ctf_type_t 1167that matters for a forward declaration is the 1168.Em ctt_name 1169which points to the name of the forward reference in the string table as 1170described earlier. 1171There is no other information recorded for forward 1172references. 1173.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict 1174Pointers, typedefs, volatile, const, and restrict are all similar in 1175.Nm . 1176They all refer to another type. 1177In the case of typedefs, they provide an 1178alternate name, while volatile, const, and restrict change how the type is 1179interpreted in the C programming language. 1180This covers the 1181.Nm 1182kinds 1183.Sy CTF_K_POINTER , 1184.Sy CTF_K_TYPEDEF , 1185.Sy CTF_K_VOLATILE , 1186.Sy CTF_K_RESTRICT , 1187and 1188.Sy CTF_K_CONST . 1189.Lp 1190These types have no variable list entries and use the member 1191.Em ctt_type 1192to refer to the base type that they modify. 1193.Ss Encoding of Unknown Types 1194Types with the kind 1195.Sy CTF_K_UNKNOWN 1196are used to indicate gaps in the type identifier space. 1197These entries consume an 1198identifier, but do not define anything. 1199Nothing should refer to these gap 1200identifiers. 1201.Ss Dependencies Between Types 1202C types can be imagined as a directed, cyclic, graph. 1203Structures and unions may 1204refer to each other in a way that creates a cyclic dependency. 1205In cases such as 1206these, the entire type section must be read in and processed. 1207Consumers must 1208not assume that every type can be laid out in dependency order; they 1209cannot. 1210.Ss The String Section 1211The last section of the 1212.Nm 1213file is the 1214.Sy string 1215section. 1216This section encodes all of the strings that appear throughout 1217the other sections. 1218It is laid out as a series of characters followed by 1219a null terminator. 1220Generally, all names are written out in ASCII, as 1221most C compilers do not allow and characters to appear in identifiers 1222outside of a subset of ASCII. 1223However, any extended characters sets 1224should be written out as a series of UTF-8 bytes. 1225.Lp 1226The first entry in the section, at offset zero, is a single null 1227terminator to reference the empty string. 1228Following that, each C string 1229should be written out, including the null terminator. 1230Offsets that refer 1231to something in this section should refer to the first byte which begins 1232a string. 1233Beyond the first byte in the section being the null 1234terminator, the order of strings is unimportant. 1235.Ss Data Encoding and ELF Considerations 1236.Nm 1237data is generally included in ELF objects which specify information to 1238identify the architecture and endianness of the file. 1239A 1240.Nm 1241container inside such an object must match the endianness of the ELF 1242object. 1243Aside from the question of the endian encoding of data, there 1244should be no other differences between architectures. 1245While many of the 1246types in this document refer to non-fixed size C integral types, they 1247are equivalent in the models 1248.Sy ILP32 1249and 1250.Sy LP64 . 1251If any other model is being used with 1252.Nm 1253data that has different sizes, then it must not use the model's sizes for 1254those integral types and instead use the fixed size equivalents based on an 1255.Sy ILP32 1256environment. 1257.Lp 1258When placing a 1259.Nm 1260container inside of an ELF object, there are certain conventions that are 1261expected for the purposes of tooling being able to find the 1262.Nm 1263data. 1264In particular, a given ELF object should only contain a single 1265.Nm 1266section. 1267Multiple containers should be merged together into a single 1268one. 1269.Lp 1270The 1271.Nm 1272file should be included in its own ELF section. 1273The section's name 1274must be 1275.Ql .SUNW_ctf . 1276The type of the section must be 1277.Sy SHT_PROGBITS . 1278The section should have a link set to the symbol table and its address 1279alignment must be 4. 1280.Sh SEE ALSO 1281.Xr dtrace 1 , 1282.Xr elf 3 , 1283.Xr gelf 3 , 1284.Xr a.out 5 , 1285.Xr elf 5 1286