1*3d8817e4Smiod@section mmo backend 2*3d8817e4SmiodThe mmo object format is used exclusively together with Professor 3*3d8817e4SmiodDonald E.@: Knuth's educational 64-bit processor MMIX. The simulator 4*3d8817e4Smiod@command{mmix} which is available at 5*3d8817e4Smiod@url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz} 6*3d8817e4Smiodunderstands this format. That package also includes a combined 7*3d8817e4Smiodassembler and linker called @command{mmixal}. The mmo format has 8*3d8817e4Smiodno advantages feature-wise compared to e.g. ELF. It is a simple 9*3d8817e4Smiodnon-relocatable object format with no support for archives or 10*3d8817e4Smioddebugging information, except for symbol value information and 11*3d8817e4Smiodline numbers (which is not yet implemented in BFD). See 12*3d8817e4Smiod@url{http://www-cs-faculty.stanford.edu/~knuth/mmix.html} for more 13*3d8817e4Smiodinformation about MMIX. The ELF format is used for intermediate 14*3d8817e4Smiodobject files in the BFD implementation. 15*3d8817e4Smiod 16*3d8817e4Smiod@c We want to xref the symbol table node. A feature in "chew" 17*3d8817e4Smiod@c requires that "commands" do not contain spaces in the 18*3d8817e4Smiod@c arguments. Hence the hyphen in "Symbol-table". 19*3d8817e4Smiod@menu 20*3d8817e4Smiod* File layout:: 21*3d8817e4Smiod* Symbol-table:: 22*3d8817e4Smiod* mmo section mapping:: 23*3d8817e4Smiod@end menu 24*3d8817e4Smiod 25*3d8817e4Smiod@node File layout, Symbol-table, mmo, mmo 26*3d8817e4Smiod@subsection File layout 27*3d8817e4SmiodThe mmo file contents is not partitioned into named sections as 28*3d8817e4Smiodwith e.g.@: ELF. Memory areas is formed by specifying the 29*3d8817e4Smiodlocation of the data that follows. Only the memory area 30*3d8817e4Smiod@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so 31*3d8817e4Smiodit is used for code (and constants) and the area 32*3d8817e4Smiod@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for 33*3d8817e4Smiodwritable data. @xref{mmo section mapping}. 34*3d8817e4Smiod 35*3d8817e4SmiodThere is provision for specifying ``special data'' of 65536 36*3d8817e4Smioddifferent types. We use type 80 (decimal), arbitrarily chosen the 37*3d8817e4Smiodsame as the ELF @code{e_machine} number for MMIX, filling it with 38*3d8817e4Smiodsection information normally found in ELF objects. @xref{mmo 39*3d8817e4Smiodsection mapping}. 40*3d8817e4Smiod 41*3d8817e4SmiodContents is entered as 32-bit words, xor:ed over previous 42*3d8817e4Smiodcontents, always zero-initialized. A word that starts with the 43*3d8817e4Smiodbyte @samp{0x98} forms a command called a @samp{lopcode}, where 44*3d8817e4Smiodthe next byte distinguished between the thirteen lopcodes. The 45*3d8817e4Smiodtwo remaining bytes, called the @samp{Y} and @samp{Z} fields, or 46*3d8817e4Smiodthe @samp{YZ} field (a 16-bit big-endian number), are used for 47*3d8817e4Smiodvarious purposes different for each lopcode. As documented in 48*3d8817e4Smiod@url{http://www-cs-faculty.stanford.edu/~knuth/mmixal-intro.ps.gz}, 49*3d8817e4Smiodthe lopcodes are: 50*3d8817e4Smiod 51*3d8817e4Smiod@table @code 52*3d8817e4Smiod@item lop_quote 53*3d8817e4Smiod0x98000001. The next word is contents, regardless of whether it 54*3d8817e4Smiodstarts with 0x98 or not. 55*3d8817e4Smiod 56*3d8817e4Smiod@item lop_loc 57*3d8817e4Smiod0x9801YYZZ, where @samp{Z} is 1 or 2. This is a location 58*3d8817e4Smioddirective, setting the location for the next data to the next 59*3d8817e4Smiod32-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}), 60*3d8817e4Smiodplus @math{Y * 2^56}. Normally @samp{Y} is 0 for the text segment 61*3d8817e4Smiodand 2 for the data segment. 62*3d8817e4Smiod 63*3d8817e4Smiod@item lop_skip 64*3d8817e4Smiod0x9802YYZZ. Increase the current location by @samp{YZ} bytes. 65*3d8817e4Smiod 66*3d8817e4Smiod@item lop_fixo 67*3d8817e4Smiod0x9803YYZZ, where @samp{Z} is 1 or 2. Store the current location 68*3d8817e4Smiodas 64 bits into the location pointed to by the next 32-bit 69*3d8817e4Smiod(@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y * 70*3d8817e4Smiod2^56}. 71*3d8817e4Smiod 72*3d8817e4Smiod@item lop_fixr 73*3d8817e4Smiod0x9804YYZZ. @samp{YZ} is stored into the current location plus 74*3d8817e4Smiod@math{2 - 4 * YZ}. 75*3d8817e4Smiod 76*3d8817e4Smiod@item lop_fixrx 77*3d8817e4Smiod0x980500ZZ. @samp{Z} is 16 or 24. A value @samp{L} derived from 78*3d8817e4Smiodthe following 32-bit word are used in a manner similar to 79*3d8817e4Smiod@samp{YZ} in lop_fixr: it is xor:ed into the current location 80*3d8817e4Smiodminus @math{4 * L}. The first byte of the word is 0 or 1. If it 81*3d8817e4Smiodis 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0, 82*3d8817e4Smiodthen @math{L = (@var{lowest 24 bits of word})}. 83*3d8817e4Smiod 84*3d8817e4Smiod@item lop_file 85*3d8817e4Smiod0x9806YYZZ. @samp{Y} is the file number, @samp{Z} is count of 86*3d8817e4Smiod32-bit words. Set the file number to @samp{Y} and the line 87*3d8817e4Smiodcounter to 0. The next @math{Z * 4} bytes contain the file name, 88*3d8817e4Smiodpadded with zeros if the count is not a multiple of four. The 89*3d8817e4Smiodsame @samp{Y} may occur multiple times, but @samp{Z} must be 0 for 90*3d8817e4Smiodall but the first occurrence. 91*3d8817e4Smiod 92*3d8817e4Smiod@item lop_line 93*3d8817e4Smiod0x9807YYZZ. @samp{YZ} is the line number. Together with 94*3d8817e4Smiodlop_file, it forms the source location for the next 32-bit word. 95*3d8817e4SmiodNote that for each non-lopcode 32-bit word, line numbers are 96*3d8817e4Smiodassumed incremented by one. 97*3d8817e4Smiod 98*3d8817e4Smiod@item lop_spec 99*3d8817e4Smiod0x9808YYZZ. @samp{YZ} is the type number. Data until the next 100*3d8817e4Smiodlopcode other than lop_quote forms special data of type @samp{YZ}. 101*3d8817e4Smiod@xref{mmo section mapping}. 102*3d8817e4Smiod 103*3d8817e4SmiodOther types than 80, (or type 80 with a content that does not 104*3d8817e4Smiodparse) is stored in sections named @code{.MMIX.spec_data.@var{n}} 105*3d8817e4Smiodwhere @var{n} is the @samp{YZ}-type. The flags for such a 106*3d8817e4Smiodsections say not to allocate or load the data. The vma is 0. 107*3d8817e4SmiodContents of multiple occurrences of special data @var{n} is 108*3d8817e4Smiodconcatenated to the data of the previous lop_spec @var{n}s. The 109*3d8817e4Smiodlocation in data or code at which the lop_spec occurred is lost. 110*3d8817e4Smiod 111*3d8817e4Smiod@item lop_pre 112*3d8817e4Smiod0x980901ZZ. The first lopcode in a file. The @samp{Z} field forms the 113*3d8817e4Smiodlength of header information in 32-bit words, where the first word 114*3d8817e4Smiodtells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}. 115*3d8817e4Smiod 116*3d8817e4Smiod@item lop_post 117*3d8817e4Smiod0x980a00ZZ. @math{Z > 32}. This lopcode follows after all 118*3d8817e4Smiodcontent-generating lopcodes in a program. The @samp{Z} field 119*3d8817e4Smioddenotes the value of @samp{rG} at the beginning of the program. 120*3d8817e4SmiodThe following @math{256 - Z} big-endian 64-bit words are loaded 121*3d8817e4Smiodinto global registers @samp{$G} @dots{} @samp{$255}. 122*3d8817e4Smiod 123*3d8817e4Smiod@item lop_stab 124*3d8817e4Smiod0x980b0000. The next-to-last lopcode in a program. Must follow 125*3d8817e4Smiodimmediately after the lop_post lopcode and its data. After this 126*3d8817e4Smiodlopcode follows all symbols in a compressed format 127*3d8817e4Smiod(@pxref{Symbol-table}). 128*3d8817e4Smiod 129*3d8817e4Smiod@item lop_end 130*3d8817e4Smiod0x980cYYZZ. The last lopcode in a program. It must follow the 131*3d8817e4Smiodlop_stab lopcode and its data. The @samp{YZ} field contains the 132*3d8817e4Smiodnumber of 32-bit words of symbol table information after the 133*3d8817e4Smiodpreceding lop_stab lopcode. 134*3d8817e4Smiod@end table 135*3d8817e4Smiod 136*3d8817e4SmiodNote that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and 137*3d8817e4Smiod@code{lop_fixo} are not generated by BFD, but are handled. They are 138*3d8817e4Smiodgenerated by @code{mmixal}. 139*3d8817e4Smiod 140*3d8817e4SmiodThis trivial one-label, one-instruction file: 141*3d8817e4Smiod 142*3d8817e4Smiod@example 143*3d8817e4Smiod :Main TRAP 1,2,3 144*3d8817e4Smiod@end example 145*3d8817e4Smiod 146*3d8817e4Smiodcan be represented this way in mmo: 147*3d8817e4Smiod 148*3d8817e4Smiod@example 149*3d8817e4Smiod 0x98090101 - lop_pre, one 32-bit word with timestamp. 150*3d8817e4Smiod <timestamp> 151*3d8817e4Smiod 0x98010002 - lop_loc, text segment, using a 64-bit address. 152*3d8817e4Smiod Note that mmixal does not emit this for the file above. 153*3d8817e4Smiod 0x00000000 - Address, high 32 bits. 154*3d8817e4Smiod 0x00000000 - Address, low 32 bits. 155*3d8817e4Smiod 0x98060002 - lop_file, 2 32-bit words for file-name. 156*3d8817e4Smiod 0x74657374 - "test" 157*3d8817e4Smiod 0x2e730000 - ".s\0\0" 158*3d8817e4Smiod 0x98070001 - lop_line, line 1. 159*3d8817e4Smiod 0x00010203 - TRAP 1,2,3 160*3d8817e4Smiod 0x980a00ff - lop_post, setting $255 to 0. 161*3d8817e4Smiod 0x00000000 162*3d8817e4Smiod 0x00000000 163*3d8817e4Smiod 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 164*3d8817e4Smiod 0x203a4040 @xref{Symbol-table}. 165*3d8817e4Smiod 0x10404020 166*3d8817e4Smiod 0x4d206120 167*3d8817e4Smiod 0x69016e00 168*3d8817e4Smiod 0x81000000 169*3d8817e4Smiod 0x980c0005 - lop_end; symbol table contained five 32-bit words. 170*3d8817e4Smiod@end example 171*3d8817e4Smiod@node Symbol-table, mmo section mapping, File layout, mmo 172*3d8817e4Smiod@subsection Symbol table format 173*3d8817e4SmiodFrom mmixal.w (or really, the generated mmixal.tex) in 174*3d8817e4Smiod@url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}): 175*3d8817e4Smiod``Symbols are stored and retrieved by means of a @samp{ternary 176*3d8817e4Smiodsearch trie}, following ideas of Bentley and Sedgewick. (See 177*3d8817e4SmiodACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369; 178*3d8817e4SmiodR.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@: 179*3d8817e4SmiodAddison--Wesley, 1998), @samp{15.4}.) Each trie node stores a 180*3d8817e4Smiodcharacter, and there are branches to subtries for the cases where 181*3d8817e4Smioda given character is less than, equal to, or greater than the 182*3d8817e4Smiodcharacter in the trie. There also is a pointer to a symbol table 183*3d8817e4Smiodentry if a symbol ends at the current node.'' 184*3d8817e4Smiod 185*3d8817e4SmiodSo it's a tree encoded as a stream of bytes. The stream of bytes 186*3d8817e4Smiodacts on a single virtual global symbol, adding and removing 187*3d8817e4Smiodcharacters and signalling complete symbol points. Here, we read 188*3d8817e4Smiodthe stream and create symbols at the completion points. 189*3d8817e4Smiod 190*3d8817e4SmiodFirst, there's a control byte @code{m}. If any of the listed bits 191*3d8817e4Smiodin @code{m} is nonzero, we execute what stands at the right, in 192*3d8817e4Smiodthe listed order: 193*3d8817e4Smiod 194*3d8817e4Smiod@example 195*3d8817e4Smiod (MMO3_LEFT) 196*3d8817e4Smiod 0x40 - Traverse left trie. 197*3d8817e4Smiod (Read a new command byte and recurse.) 198*3d8817e4Smiod 199*3d8817e4Smiod (MMO3_SYMBITS) 200*3d8817e4Smiod 0x2f - Read the next byte as a character and store it in the 201*3d8817e4Smiod current character position; increment character position. 202*3d8817e4Smiod Test the bits of @code{m}: 203*3d8817e4Smiod 204*3d8817e4Smiod (MMO3_WCHAR) 205*3d8817e4Smiod 0x80 - The character is 16-bit (so read another byte, 206*3d8817e4Smiod merge into current character. 207*3d8817e4Smiod 208*3d8817e4Smiod (MMO3_TYPEBITS) 209*3d8817e4Smiod 0xf - We have a complete symbol; parse the type, value 210*3d8817e4Smiod and serial number and do what should be done 211*3d8817e4Smiod with a symbol. The type and length information 212*3d8817e4Smiod is in j = (m & 0xf). 213*3d8817e4Smiod 214*3d8817e4Smiod (MMO3_REGQUAL_BITS) 215*3d8817e4Smiod j == 0xf: A register variable. The following 216*3d8817e4Smiod byte tells which register. 217*3d8817e4Smiod j <= 8: An absolute symbol. Read j bytes as the 218*3d8817e4Smiod big-endian number the symbol equals. 219*3d8817e4Smiod A j = 2 with two zero bytes denotes an 220*3d8817e4Smiod unknown symbol. 221*3d8817e4Smiod j > 8: As with j <= 8, but add (0x20 << 56) 222*3d8817e4Smiod to the value in the following j - 8 223*3d8817e4Smiod bytes. 224*3d8817e4Smiod 225*3d8817e4Smiod Then comes the serial number, as a variant of 226*3d8817e4Smiod uleb128, but better named ubeb128: 227*3d8817e4Smiod Read bytes and shift the previous value left 7 228*3d8817e4Smiod (multiply by 128). Add in the new byte, repeat 229*3d8817e4Smiod until a byte has bit 7 set. The serial number 230*3d8817e4Smiod is the computed value minus 128. 231*3d8817e4Smiod 232*3d8817e4Smiod (MMO3_MIDDLE) 233*3d8817e4Smiod 0x20 - Traverse middle trie. (Read a new command byte 234*3d8817e4Smiod and recurse.) Decrement character position. 235*3d8817e4Smiod 236*3d8817e4Smiod (MMO3_RIGHT) 237*3d8817e4Smiod 0x10 - Traverse right trie. (Read a new command byte and 238*3d8817e4Smiod recurse.) 239*3d8817e4Smiod@end example 240*3d8817e4Smiod 241*3d8817e4SmiodLet's look again at the @code{lop_stab} for the trivial file 242*3d8817e4Smiod(@pxref{File layout}). 243*3d8817e4Smiod 244*3d8817e4Smiod@example 245*3d8817e4Smiod 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 246*3d8817e4Smiod 0x203a4040 247*3d8817e4Smiod 0x10404020 248*3d8817e4Smiod 0x4d206120 249*3d8817e4Smiod 0x69016e00 250*3d8817e4Smiod 0x81000000 251*3d8817e4Smiod@end example 252*3d8817e4Smiod 253*3d8817e4SmiodThis forms the trivial trie (note that the path between ``:'' and 254*3d8817e4Smiod``M'' is redundant): 255*3d8817e4Smiod 256*3d8817e4Smiod@example 257*3d8817e4Smiod 203a ":" 258*3d8817e4Smiod 40 / 259*3d8817e4Smiod 40 / 260*3d8817e4Smiod 10 \ 261*3d8817e4Smiod 40 / 262*3d8817e4Smiod 40 / 263*3d8817e4Smiod 204d "M" 264*3d8817e4Smiod 2061 "a" 265*3d8817e4Smiod 2069 "i" 266*3d8817e4Smiod 016e "n" is the last character in a full symbol, and 267*3d8817e4Smiod with a value represented in one byte. 268*3d8817e4Smiod 00 The value is 0. 269*3d8817e4Smiod 81 The serial number is 1. 270*3d8817e4Smiod@end example 271*3d8817e4Smiod 272*3d8817e4Smiod@node mmo section mapping, , Symbol-table, mmo 273*3d8817e4Smiod@subsection mmo section mapping 274*3d8817e4SmiodThe implementation in BFD uses special data type 80 (decimal) to 275*3d8817e4Smiodencapsulate and describe named sections, containing e.g.@: debug 276*3d8817e4Smiodinformation. If needed, any datum in the encapsulation will be 277*3d8817e4Smiodquoted using lop_quote. First comes a 32-bit word holding the 278*3d8817e4Smiodnumber of 32-bit words containing the zero-terminated zero-padded 279*3d8817e4Smiodsegment name. After the name there's a 32-bit word holding flags 280*3d8817e4Smioddescribing the section type. Then comes a 64-bit big-endian word 281*3d8817e4Smiodwith the section length (in bytes), then another with the section 282*3d8817e4Smiodstart address. Depending on the type of section, the contents 283*3d8817e4Smiodmight follow, zero-padded to 32-bit boundary. For a loadable 284*3d8817e4Smiodsection (such as data or code), the contents might follow at some 285*3d8817e4Smiodlater point, not necessarily immediately, as a lop_loc with the 286*3d8817e4Smiodsame start address as in the section description, followed by the 287*3d8817e4Smiodcontents. This in effect forms a descriptor that must be emitted 288*3d8817e4Smiodbefore the actual contents. Sections described this way must not 289*3d8817e4Smiodoverlap. 290*3d8817e4Smiod 291*3d8817e4SmiodFor areas that don't have such descriptors, synthetic sections are 292*3d8817e4Smiodformed by BFD. Consecutive contents in the two memory areas 293*3d8817e4Smiod@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and 294*3d8817e4Smiod@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in 295*3d8817e4Smiodsections named @code{.text} and @code{.data} respectively. If an area 296*3d8817e4Smiodis not otherwise described, but would together with a neighboring 297*3d8817e4Smiodlower area be less than @samp{0x40000000} bytes long, it is joined 298*3d8817e4Smiodwith the lower area and the gap is zero-filled. For other cases, 299*3d8817e4Smioda new section is formed, named @code{.MMIX.sec.@var{n}}. Here, 300*3d8817e4Smiod@var{n} is a number, a running count through the mmo file, 301*3d8817e4Smiodstarting at 0. 302*3d8817e4Smiod 303*3d8817e4SmiodA loadable section specified as: 304*3d8817e4Smiod 305*3d8817e4Smiod@example 306*3d8817e4Smiod .section secname,"ax" 307*3d8817e4Smiod TETRA 1,2,3,4,-1,-2009 308*3d8817e4Smiod BYTE 80 309*3d8817e4Smiod@end example 310*3d8817e4Smiod 311*3d8817e4Smiodand linked to address @samp{0x4}, is represented by the sequence: 312*3d8817e4Smiod 313*3d8817e4Smiod@example 314*3d8817e4Smiod 0x98080050 - lop_spec 80 315*3d8817e4Smiod 0x00000002 - two 32-bit words for the section name 316*3d8817e4Smiod 0x7365636e - "secn" 317*3d8817e4Smiod 0x616d6500 - "ame\0" 318*3d8817e4Smiod 0x00000033 - flags CODE, READONLY, LOAD, ALLOC 319*3d8817e4Smiod 0x00000000 - high 32 bits of section length 320*3d8817e4Smiod 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits 321*3d8817e4Smiod 0x00000000 - high 32 bits of section address 322*3d8817e4Smiod 0x00000004 - section address is 4 323*3d8817e4Smiod 0x98010002 - 64 bits with address of following data 324*3d8817e4Smiod 0x00000000 - high 32 bits of address 325*3d8817e4Smiod 0x00000004 - low 32 bits: data starts at address 4 326*3d8817e4Smiod 0x00000001 - 1 327*3d8817e4Smiod 0x00000002 - 2 328*3d8817e4Smiod 0x00000003 - 3 329*3d8817e4Smiod 0x00000004 - 4 330*3d8817e4Smiod 0xffffffff - -1 331*3d8817e4Smiod 0xfffff827 - -2009 332*3d8817e4Smiod 0x50000000 - 80 as a byte, padded with zeros. 333*3d8817e4Smiod@end example 334*3d8817e4Smiod 335*3d8817e4SmiodNote that the lop_spec wrapping does not include the section 336*3d8817e4Smiodcontents. Compare this to a non-loaded section specified as: 337*3d8817e4Smiod 338*3d8817e4Smiod@example 339*3d8817e4Smiod .section thirdsec 340*3d8817e4Smiod TETRA 200001,100002 341*3d8817e4Smiod BYTE 38,40 342*3d8817e4Smiod@end example 343*3d8817e4Smiod 344*3d8817e4SmiodThis, when linked to address @samp{0x200000000000001c}, is 345*3d8817e4Smiodrepresented by: 346*3d8817e4Smiod 347*3d8817e4Smiod@example 348*3d8817e4Smiod 0x98080050 - lop_spec 80 349*3d8817e4Smiod 0x00000002 - two 32-bit words for the section name 350*3d8817e4Smiod 0x7365636e - "thir" 351*3d8817e4Smiod 0x616d6500 - "dsec" 352*3d8817e4Smiod 0x00000010 - flag READONLY 353*3d8817e4Smiod 0x00000000 - high 32 bits of section length 354*3d8817e4Smiod 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits 355*3d8817e4Smiod 0x20000000 - high 32 bits of address 356*3d8817e4Smiod 0x0000001c - low 32 bits of address 0x200000000000001c 357*3d8817e4Smiod 0x00030d41 - 200001 358*3d8817e4Smiod 0x000186a2 - 100002 359*3d8817e4Smiod 0x26280000 - 38, 40 as bytes, padded with zeros 360*3d8817e4Smiod@end example 361*3d8817e4Smiod 362*3d8817e4SmiodFor the latter example, the section contents must not be 363*3d8817e4Smiodloaded in memory, and is therefore specified as part of the 364*3d8817e4Smiodspecial data. The address is usually unimportant but might 365*3d8817e4Smiodprovide information for e.g.@: the DWARF 2 debugging format. 366