xref: /plan9-contrib/sys/doc/libmach.ms (revision 219b2ee8daee37f4aad58d63f21287faa8e4ffdc)
1.TL
2Adding Application Support for a New Architecture in Plan 9
3.AU
4Bob Flandrena
5bobf@plan9.att.com
6.SH
7Introduction
8.LP
9Plan 9 has five classes of architecture-dependent software:
10headers, kernels, compilers and loaders, the
11.CW libc
12system library, and a few application programs.  In general,
13architecture-dependent programs
14consist of a portable part shared by all architectures and a
15processor-specific portion for each supported architecture.
16The portable code is often compiled and stored in a library
17associated with
18each architecture.  A program is built by
19compiling the architecture-specific code and loading it with the
20library.  Support for a new architecture is provided
21by building a compiler for the architecture, using it to
22compile the portable code into libraries,
23writing the architecture-specific code, and
24then loading that code with
25the libraries.
26.LP
27This document describes the organization of the architecture-dependent
28code and headers on Plan 9.
29The first section briefly discusses the layout of
30the headers and the source code for the kernels, compilers, loaders, and the
31system library,
32.CW libc .
33The second section provides a detailed
34discussion of the structure of
35.CW libmach ,
36a library containing almost
37all architecture-dependent code
38used by application programs.
39The final section describes the steps required to add
40application program support for a new architecture.
41.SH
42Directory Structure
43.PP
44Architecture-dependent information for the new processor
45is stored in the directory tree rooted at
46.CW /\fIm
47where
48.I m
49is the name of the new architecture (e.g.,
50.CW mips ).
51The new directory should be initialized with several important
52subdirectories, notably
53.CW bin ,
54.CW include ,
55and
56.CW lib .
57The directory tree of an existing architecture
58serves as a good model for the new tree.
59The architecture-dependent
60.CW mkfile
61must be stored in the newly created root directory
62for the architecture.  It is easiest to copy the
63mkfile for an existing architecture and modify
64it for the new architecture.  When the mkfile
65is correct, change the
66.CW OS
67and
68.CW CPUS
69entries in the mkfiles of all existing architectures
70to reflect the addition of the new architecture.
71.SH
72Headers
73.LP
74Architecture-dependent headers are stored in directory
75.CW /\fIm\fP/include
76where
77.I m
78is the name of the architecture (e.g.,
79.CW mips ).
80Three header files are required:
81.CW u.h ,
82.CW ureg.h ,
83and
84.CW stdarg.h .
85The first defines fundamental data types
86and bit settings for the floating point
87status and control registers.  It
88can usually be constructed by modifying an
89existing
90.CW u.h .
91The
92.CW ureg.h
93file
94contains a structure describing the layout
95of the saved register set for
96the architecture; it is defined by the kernel.
97The
98.CW stdarg.h
99file
100contains macros for addressing the elements
101of a variable length argument list.  This
102header can usually be copied from an architecture
103with a similar stack model.
104.LP
105Header file
106.CW /sys/include/a.out.h
107contains the definitions of the magic
108numbers used to identify executables for
109each architecture.  When support for a new
110architecture is added, the magic number
111for the architecture must be defined in this file.
112.LP
113The header format of a bootable executable is defined by
114each manufacturer.  Header file
115.CW /sys/include/bootexec.h
116contains structures describing the headers currently
117supported.  If the new architecture uses a common header
118such as COFF,
119the header format is probably already defined,
120but if the bootable header format is non-standard,
121a structure defining the format must be added to this file.
122.LP
123.SH
124Kernel
125.LP
126Although the kernel depends critically on the properties of the underlying
127hardware, most of the
128higher-level kernel functions, including process
129management, paging, pseudo-devices, and some
130networking code, are independent of processor
131architecture.  The portable kernel code
132is divided into two parts: that implementing kernel
133functions and that devoted to the boot process.
134Code in the first class is stored in directory
135.CW /sys/src/9/port
136and the portable boot code is stored in
137.CW /sys/src/9/boot .
138Architecture-dependent kernel code is stored in the
139subdirectories of
140.CW /sys/src/9
141named for each architecture.
142.LP
143The relationship between the kernel code and the boot code
144is convoluted and subtle.  The portable boot code
145is compiled into a library for each architecture.  An architecture-specific
146main program is loaded with the appropriate library and the resulting
147executable is compiled into the kernel where it is executed as
148a user process in the final stages of kernel initialization.  The boot process
149performs authentication, attaches the name space root to the appropriate
150file system and starts the
151.CW init
152process.
153.LP
154The organization of the portable kernel source code differs from that
155of most other architecture-specific code.
156Instead of storing the portable code in a library
157dedicated to the architecture and loading the architecture-specific
158code with that library, the portable code is compiled into
159the directory associated with the architecture and linked with the
160architecture-specific object files stored there.
161.LP
162.SH
163Compilers and Loaders
164.LP
165The compiler source code conforms to the usual
166organization: portable code is compiled into a library
167for each architecture
168and the architecture-dependent code is loaded with
169that library.
170The common compiler code is stored in
171.CW /sys/src/cmd/cc .
172The
173.CW mkfile
174in this directory compiles the portable source and
175archives the objects in a library for each architecture.
176The architecture-specific compiler source
177is stored in a subdirectory of
178.CW /sys/src/cmd
179with the same name as the compiler (e.g.,
180.CW /sys/src/cmd/vc ).
181.LP
182The source code for the
183Alef
184compilers,
185.CW val ,
186.CW kal ,
187and
188.CW 8al ,
189is organized in a similar manner.
190The
191.CW port
192subdirectory of
193.CW /sys/src/alef
194contains the portable code;
195subdirectories
196.CW v ,
197.CW k ,
198and
199.CW 8
200contain the architecture-dependent portion of each compiler.
201However, the build procedure for the
202Alef
203compilers is similar to that of the kernel; the
204build is run from the architecture-specific
205directory and the portable source is compiled
206into object files in the build directory, where
207it is loaded with the architecture-specific object files.
208.LP
209There is no portable code shared by the loaders.
210Each directory of loader source
211code is self-contained, except for
212a header file and an instruction name table
213included from the
214directory of the associated
215compiler.
216.LP
217.SH
218Libraries
219.LP
220Most C library modules are
221portable; the source code is stored in
222directories
223.CW /sys/src/libc/port
224and
225.CW /sys/src/libc/9sys .
226Architecture-dependent library code
227is stored in the subdirectory of
228.CW /sys/src/libc
229named the same as the target processor.
230Non-portable functions not only
231implement architecture-dependent operations
232but also supply assembly language implementations
233of functions where speed is critical.
234Directory
235.CW /sys/src/libc/9syscall
236is unusual because it
237contains architecture-dependent information
238for all architectures.
239It holds only a header file defining
240the names and numbers of system calls
241and a
242.CW mkfile .
243The
244.CW mkfile
245executes an
246.CW rc
247script that parses the header file, constructs
248assembler language functions implementing the system
249call for each architecture, assembles the code,
250and archives the object files in
251.CW libc .
252The assembler language syntax and the system interface
253differ for each architecture.
254The
255.CW rc
256script in this
257.CW mkfile
258must be modified to support a new architecture.
259.LP
260The source code for the
261Alef
262libraries, stored
263in
264.CW /sys/src/alef/lib ,
265is organized similarly except
266that directory
267.CW 9syscall
268is named
269.CW sys .
270Alef uses a different parameter passing
271convention so its system call interface code
272differs from that of the C compilers.
273.LP
274.SH
275Applications
276.LP
277Application programs process two forms of architecture-dependent
278information: executable images and intermediate object files.
279Almost all processing is on executable files.
280System library
281.CW libmach
282provides functions to convert
283architecture-specific data
284to a portable format.  Application programs
285then process this data independent of its
286underlying representation.
287Further, when a new architecture is implemented
288almost all code changes
289are confined to the library;
290most affected application programs need only be reloaded.
291The source code for the library is stored in
292.CW /sys/src/libmach .
293.LP
294An application program running on one type of
295processor must be able to interpret
296architecture-dependent information for all
297supported processors.
298For example, a debugger must be able to debug
299the executables of
300all architectures, not just the
301architecture on which it is executing, since
302.CW /proc
303may be imported from a different machine.
304.LP
305A small part of the application library
306provides functions to
307extract symbol references from object files.
308The remainder provides the following processing
309of executable files or memory images:
310.RS
311.LP
312.IP \(bu
313Header interpretation.
314.IP \(bu
315Symbol table interpretation.
316.IP \(bu
317Execution context interpretation, such as stack traces
318and stack frame location.
319.IP \(bu
320Instruction interpretation including disassembly and
321instruction size and follow-set calculations.
322.IP \(bu
323Exception and floating point number interpretation.
324.IP \(bu
325Architecture-independent read and write access through a
326relocation map.
327.RE
328.LP
329Header file
330.CW /sys/include/mach.h
331defines the interfaces to the
332application library.  Manual pages
333.I mach (2),
334.I symbol (2),
335and
336.I object (2)
337describe the details of the
338library functions.
339.LP
340Two data structures, called
341.CW Mach
342and
343.CW Machdata ,
344contain architecture-dependent  parameters and
345a jump table of functions.
346Global variables
347.CW mach
348and
349.CW machdata
350point to the
351.CW Mach
352and
353.CW Machdata
354data structures associated with the target architecture.
355An application determines the target architecture of
356a file or executable image, sets the global pointers
357to the data structures associated with that architecture,
358and subsequently performs all references indirectly through the
359pointers.
360As a result, direct references to the tables for each
361architecture are avoided and the application code intrinsically
362supports all architectures (though only one at a time).
363.LP
364Object file processing is handled similarly: architecture-dependent
365functions identify and
366decode the intermediate files for the processor.
367The application indirectly
368invokes a classification function to identify
369the architecture of the object code and select the
370appropriate decoding function.  Subsequent calls
371then use that function to decode each record.  Again,
372the layer of indirection allows the application code
373to support all architectures without modification.
374.LP
375Splitting the architecture-dependent information
376between the
377.CW Mach
378and
379.CW Machdata
380data structures
381allows an application to select
382an appropriate level of service.  Even though an application
383does not directly reference the architecture-specific data structures,
384it must load the
385architecture-dependent tables and code
386for all architectures it supports.  The size of this data
387can be substantial and many applications do not require
388the full range of architecture-dependent functionality.
389For example, the
390.CW size
391command does not require the disassemblers for every architecture;
392it only needs to decode the header.
393The
394.CW Mach
395data structure contains a few architecture-specific parameters
396and a description of the processor register set.
397The size of the structure
398varies with the size of the register
399set but is generally small.
400The
401.CW Machdata
402data structure contains
403a jump table of architecture-dependent functions;
404the amount of code and data referenced by this table
405is usually large.
406.SH
407Libmach Source Code Organization
408.LP
409The
410.CW libmach
411library provides four classes of functionality:
412.LP
413.IP "Header and Symbol Table Decoding\ -\ "
414Files
415.CW executable.c
416and
417.CW sym.c
418contain code to interpret the header and
419symbol tables of
420an executable file or executing image.
421Function
422.CW crackhdr
423decodes the header,
424reformats the
425information into an
426.CW Fhdr
427data structure, and points
428global variable
429.CW mach
430to the
431.CW Mach
432data structure of the target architecture.
433The symbol table processing
434uses the data in the
435.CW Fhdr
436structure to decode the symbol table.
437A variety of symbol table access functions support
438queries on the reformatted table.
439.IP "Debugger Support\ -\ "
440Files named
441.CW \fIm\fP.c ,
442where
443.I m
444is the code letter assigned to the architecture,
445contain the initialized
446.CW Mach
447data structure and the definition of the register
448set for each architecture.
449Architecture-specific debugger support functions and
450an initialized
451.CW Machdata
452structure are stored in
453files named
454.CW \fIm\fPdb.c .
455Files
456.CW machdata.c
457and
458.CW setmach.c
459contain debugger support functions shared
460by multiple architectures.
461.IP "Architecture-Independent Access\ -\ "
462Files
463.CW map.c ,
464.CW access.c ,
465and
466.CW swap.c
467provide accesses through a relocation map
468to data in an executable file or executing image.
469Byte-swapping is performed as needed.  Global variables
470.CW mach
471and
472.CW machdata
473must point to the
474.CW Mach
475and
476.CW Machdata
477data structures of the target architecture.
478.IP "Object File Interpretation\ -\ "
479These files contain functions to identify the
480target architecture of an
481intermediate object file
482and extract references to symbols.  File
483.CW obj.c
484contains code common to all architectures;
485file
486.CW \fIm\fPobj.c
487contains the architecture-specific source code
488for the machine with code character
489.I m .
490.LP
491The
492.CW Machdata
493data structure is primarily a jump
494table of architecture-dependent debugger support
495functions. Functions select the
496.CW Machdata
497structure for a target architecture based
498on the value of the
499.CW type
500code in the
501.CW Fhdr
502structure or the name of the architecture.
503The jump table provides functions to swap bytes, interpret
504machine instructions,
505perform stack
506traces, find stack frames, format floating point
507numbers, and decode machine exceptions.  Some functions, such as
508machine exception decoding, are idiosyncratic and must be
509supplied for each architecture.  Others depend
510on the compiler run-time model and several
511architectures may share code common to a model.  For
512example, many architectures share the code to
513process the fixed-frame stack model implemented by
514several of the compilers.
515Finally, some
516functions, such as byte-swapping, provide a general capability and
517the jump table need only select an implementation appropriate
518to the architecture.
519.LP
520.SH
521Adding Application Support for a New Architecture
522.LP
523This section describes the
524steps required to add application-level
525support for a new architecture.
526We assume
527the kernel, compilers, loaders and system libraries
528for the new architecture are already in place.  This
529implies that a code character has been assigned and
530that the architecture-specific headers have been
531updated.
532With the exception of two programs,
533application-level changes are confined to header
534files and the source code in
535.CW /sys/src/libmach .
536.LP
537.IP 1.
538Begin by updating the application library
539header file in
540.CW /sys/include/mach.h .
541Add the following symbolic codes to the
542.CW enum
543statement near the beginning of the file:
544.RS
545.IP \(bu
546The processor type code, e.g.,
547.CW MSPARC .
548.IP \(bu
549The type of the executable.  There are usually
550two codes needed: one for a bootable
551executable (i.e., a kernel) and one for an
552application executable.
553.IP \(bu
554The disassembler type code.  Add one entry for
555each supported disassembler for the architecture.
556.IP \(bu
557A symbolic code for the object file.
558.RE
559.LP
560.IP 2.
561In a file name
562.CW /sys/src/libmach/\fIm\fP.c
563(where
564.I m
565is the identifier character assigned to the architecture),
566initialize
567.CW Reglist
568and
569.CW Mach
570data structures with values defining
571the register set and various system parameters.
572The source file for another architecture
573can serve as template.
574Most of the fields of the
575.CW Mach
576data structure are obvious
577but a few require further explanation.
578.RS
579.IP "\f(CWkbase\fP\ -\ "
580This field
581contains the address of the kernel
582.CW ublock .
583The debuggers
584assume the first entry of the kernel
585.CW ublock
586points to the
587.CW Proc
588structure for a kernel thread.
589.IP "\f(CWktmask\fP\ -\ "
590This field
591is a bit mask used to calculate the kernel text address from
592the kernel
593.CW ublock
594address.
595The first page of the
596kernel text segment is calculated by
597ANDing
598the negation of this mask with
599.CW kbase .
600.IP "\f(CWkspoff\fP\ -\ "
601This field
602contains the byte offset in the
603.CW Proc
604data structure to the saved kernel
605stack pointer for a suspended kernel thread.  This
606is the offset to the
607.CW sched.sp
608field of a
609.CW Proc
610table entry.
611.IP "\f(CWkpcoff\fP\ -\ "
612This field contains the byte offset into the
613.CW Proc
614data structure
615of
616the program counter of a suspended kernel thread.
617This is the offset to
618field
619.CW sched.pc
620in that structure.
621.IP "\f(CWkspdelta\fP and \f(CWkpcdelta\fP\ -\ "
622These fields
623contain corrections to be added to
624the stack pointer and program counter, respectively,
625to properly locate the stack and next
626instruction of a kernel thread.  These
627values bias the saved registers retrieved
628from the
629.CW Label
630structure named
631.CW sched
632in the
633.CW Proc
634data structure.
635Most architectures require no bias
636and these fields contain zeros.
637.IP "\f(CWscalloff\fP\ -\ "
638This field
639contains the byte offset of the
640.CW scallnr
641field in the
642.CW ublock
643data structure associated with a process.
644The
645.CW scallnr
646field contains the number of the
647last system call executed by the process.
648The location of the field varies depending on
649the size of the floating point register set
650which precedes it in the
651.CW ublock .
652.RE
653.LP
654.IP 3.
655Update the initialization of the
656.CW ExecTable
657data structure at the beginning of file
658.CW /sys/src/libmach/executable.c
659with data for the new architecture.
660Most architectures
661require two entries: one for
662a normal executable and
663one for a bootable
664image.  Each table entry contains:
665.RS
666.IP \(bu
667Magic Number\ \-\
668The big-endian magic number assigned to the architecture in
669.CW /sys/include/a.out.h .
670.IP \(bu
671Name\ \-\
672A string describing the executable.
673.IP \(bu
674Executable type code\ \-\
675The executable code assigned in
676.CW /sys/include/mach.h .
677.IP \(bu
678\f(CWMach\fP pointer\ \-\
679The address of the initialized
680.CW Mach
681data structure constructed in Step 2.
682You must also add the name of this table to the
683list of
684.CW Mach
685table definitions immediately preceding the
686.CW ExecTable
687initialization.
688.IP \(bu
689Header size\ \-\
690The number of bytes in the executable file header.
691The size of a normal executable header is always
692.CW sizeof(Exec) .
693The size of a bootable header is
694determined by the size of the structure
695for the architecture defined in
696.CW /sys/include/bootexec.h .
697.IP \(bu
698Byte-swapping function\ \-\
699The address of
700.CW beswal
701or
702.CW leswal
703for big-endian and little-endian
704architectures, respectively.
705.IP \(bu
706Decoder function\ -\
707The address of a function to decode the header.
708Function
709.CW adotout
710decodes the common header shared by all normal
711(i.e., non-bootable) executable files.
712The header format of bootable
713executable files is defined by the manufacturer and
714a custom function is almost always
715required to decode it.
716Header file
717.CW /sys/include/bootexec.h
718contains data structures defining the bootable
719headers for all architectures.  If the new architecture
720uses an existing format, the appropriate
721decoding function should already be in
722.CW executable.c .
723If the header format is unique, then
724a new function must be added to this file.
725Usually the decoding function for an existing
726architecture can be adopted with minor modifications.
727.RE
728.LP
729.IP 4.
730Write an object file parser and
731store it in file
732.CW /sys/src/libmach/\fIm\fPobj.c
733where
734.I m
735is the identifier character assigned to the architecture.
736Two functions are required: a predicate to identify an
737object file for the architecture and a function to extract
738symbol references from the object code.
739The object code format is obscure but
740it is often possible to adopt the
741code of an existing architecture
742with minor modifications.
743When these
744functions are in hand, insert their addresses
745in the jump table at the beginning of file
746.CW /sys/src/libmach/obj.c .
747.LP
748.IP 5.
749Implement the required debugger support functions and
750initialize the parameters and jump table of the
751.CW Machdata
752data structure for the architecture.
753Conventionally, this code is stored in
754a file named
755.CW /sys/src/libmach/\fIm\fPdb.c
756where
757.I m
758is the identifier character assigned to the architecture.
759The fields of the
760.CW Machdata
761structure are:
762.RS
763.IP "\f(CWbpinst\fP and \f(CWbpsize\fP\ -\ "
764These fields
765contain the breakpoint instruction and the size
766of the instruction, respectively.
767.IP "\f(CWswab\fP\ -\ "
768This field
769contains the address of a function to
770byte-swap a 16-bit value.  Choose
771.CW leswab
772or
773.CW beswab
774for little-endian or big-endian architectures, respectively.
775.IP "\f(CWswal\fP\ -\ "
776This field
777contains the address of a function to
778byte-swap a 32-bit value.  Choose
779.CW leswal
780or
781.CW beswal
782for little-endian or big-endian architectures, respectively.
783.IP "\f(CWctrace\fP\ -\ "
784This field
785contains the address of a function to perform a
786C-language stack trace.  Two general trace functions,
787.CW risctrace
788and
789.CW cisctrace ,
790traverse fixed-frame and relative-frame stacks,
791respectively.  If the compiler for the
792new architecture conforms to one of
793these models, select the appropriate function.  If the
794stack model is unique,
795supply a custom stack trace function.
796.IP "\f(CWfindframe\fP\ -\ "
797This field
798contains the address of a function to locate the stack
799frame associated with a text address.
800Generic functions
801.CW riscframe
802and
803.CW ciscframe
804process fixed-frame and relative-frame stack
805models.
806.IP "\f(CWufixup\fP\ -\ "
807This field
808contains the address of a function to adjust
809the base address of the register save area.
810Currently, only the
81168020 requires this bias
812to offset over the active
813exception frame.
814.IP "\f(CWexcep\fP\ -\ "
815This field
816contains the address of a function to produce a
817text
818string describing the
819current exception.
820Each architecture stores exception
821information uniquely, so this code must always be supplied.
822.IP "\f(CWbpfix\fP\ -\ "
823This field
824contains the address of a function to adjust an
825address prior to laying down a breakpoint.
826.IP "\f(CWsftos\fP\ -\ "
827This field
828contains the address of a function to convert a single
829precision floating point value
830to a string.  Choose
831.CW leieeesftos
832for little-endian
833or
834.CW beieeesftos
835for big-endian architectures.
836.IP "\f(CWdftos\fP\ -\ "
837This field
838contains the address of a function to convert a double
839precision floating point value
840to a string.  Choose
841.CW leieeedftos
842for little-endian
843or
844.CW beieeedftos
845for big-endian architectures.
846.IP "\f(CWfoll\fP, \f(CWdas\fP, \f(CWhexinst\fP, and \f(CWinstsize\fP\ -\ "
847These fields point to functions that interpret machine
848instructions.
849They rely on disassembly of the instruction
850and are unique to each architecture.
851.CW Foll
852calculates the follow set of an instruction.
853.CW Das
854disassembles a machine instruction to assembly language.
855.CW Hexinst
856formats a machine instruction as a text
857string of
858hexadecimal digits.
859.CW Instsize
860calculates the size in bytes, of an instruction.
861Once the disassembler is written, the other functions
862can usually be implemented as trivial extensions of it.
863.LP
864It is possible to provide support for a new architecture
865incrementally by filling the jump table entries
866of the
867.CW Machdata
868structure as code is written.  In general, if
869a jump table entry contains a zero, application
870programs requiring that function will issue an
871error message instead of attempting to
872call the function.  For example,
873the
874.CW foll ,
875.CW das ,
876.CW hexinst ,
877and
878.CW instsize
879jump table slots can be zeroed until a
880disassembler is written.
881Other capabilities, such as
882stack trace or variable inspection,
883can be supplied and will be available to
884the debuggers but attempts to use the
885disassembler will result in an error message.
886.RE
887.IP 6.
888Update the table named
889.CW machines
890near the beginning of
891.CW /sys/src/libmach/setmach.c .
892This table binds the
893file type code and machine name to the
894.CW Mach
895and
896.CW Machdata
897structures of an architecture.
898The names of the initialized
899.CW Mach
900and
901.CW Machdata
902structures built in steps 2 and 5
903must be added to the list of
904structure definitions immediately
905preceding the table initialization.
906If both Plan 9 and
907native disassembly are supported, add
908an entry for each disassembler to the table.  The
909entry for the default disassembler (usually
910Plan 9) must be first.
911.IP 7.
912Add an entry describing the architecture to
913the table named
914.CW trans
915near the end of
916.CW /sys/src/cmd/prof.c .
917.RE
918.IP 8.
919Add an entry describing the architecture to
920the table named
921.CW objtype
922near the start of
923.CW /sys/src/cmd/pcc.c .
924.RE
925.IP 9.
926Recompile and install
927all application programs that include header file
928.CW mach.h
929and load with
930.CW libmach.a .
931