xref: /plan9/sys/doc/libmach.ms (revision ec59a3ddbfceee0efe34584c2c9981a5e5ff1ec4)
1.HTML "Adding Application Support for a New Architecture in Plan 9
2.TL
3Adding Application Support for a New Architecture in Plan 9
4.AU
5Bob Flandrena
6bobf@plan9.bell-labs.com
7.SH
8Introduction
9.LP
10Plan 9 has five classes of architecture-dependent software:
11headers, kernels, compilers and loaders, the
12.CW libc
13system library, and a few application programs.  In general,
14architecture-dependent programs
15consist of a portable part shared by all architectures and a
16processor-specific portion for each supported architecture.
17The portable code is often compiled and stored in a library
18associated with
19each architecture.  A program is built by
20compiling the architecture-specific code and loading it with the
21library.  Support for a new architecture is provided
22by building a compiler for the architecture, using it to
23compile the portable code into libraries,
24writing the architecture-specific code, and
25then loading that code with
26the libraries.
27.LP
28This document describes the organization of the architecture-dependent
29code and headers on Plan 9.
30The first section briefly discusses the layout of
31the headers and the source code for the kernels, compilers, loaders, and the
32system library,
33.CW libc .
34The second section provides a detailed
35discussion of the structure of
36.CW libmach ,
37a library containing almost
38all architecture-dependent code
39used by application programs.
40The final section describes the steps required to add
41application program support for a new architecture.
42.SH
43Directory Structure
44.PP
45Architecture-dependent information for the new processor
46is stored in the directory tree rooted at \f(CW/\fP\fIm\fP
47where
48.I m
49is the name of the new architecture (e.g.,
50.CW mips ).
51The new directory should be initialized with several important
52subdirectories, notably
53.CW bin ,
54.CW include ,
55and
56.CW lib .
57The directory tree of an existing architecture
58serves as a good model for the new tree.
59The architecture-dependent
60.CW mkfile
61must be stored in the newly created root directory
62for the architecture.  It is easiest to copy the
63mkfile for an existing architecture and modify
64it for the new architecture.  When the mkfile
65is correct, change the
66.CW OS
67and
68.CW CPUS
69variables in the
70.CW /sys/src/mkfile.proto
71to reflect the addition of the new architecture.
72.SH
73Headers
74.LP
75Architecture-dependent headers are stored in directory
76.CW /\fIm\fP/include
77where
78.I m
79is the name of the architecture (e.g.,
80.CW mips ).
81Two header files are required:
82.CW u.h
83and
84.CW ureg.h .
85The first defines fundamental data types,
86bit settings for the floating point
87status and control registers, and
88.CW va_list
89processing which depends on the stack
90model for the architecture.  This file
91is best built by copying and modifying the
92.CW u.h
93file from an architecture
94with a similar stack model.
95The
96.CW ureg.h
97file
98contains a structure describing the layout
99of the saved register set for
100the architecture; it is defined by the kernel.
101.LP
102Header file
103.CW /sys/include/a.out.h
104contains the definitions of the magic
105numbers used to identify executables for
106each architecture.  When support for a new
107architecture is added, the magic number
108for the architecture must be added to this file.
109.LP
110The header format of a bootable executable is defined by
111each manufacturer.  Header file
112.CW /sys/include/bootexec.h
113contains structures describing the headers currently
114supported.  If the new architecture uses a common header
115such as COFF,
116the header format is probably already defined,
117but if the bootable header format is non-standard,
118a structure defining the format must be added to this file.
119.LP
120.SH
121Kernel
122.LP
123Although the kernel depends critically on the properties of the underlying
124hardware, most of the
125higher-level kernel functions, including process
126management, paging, pseudo-devices, and some
127networking code, are independent of processor
128architecture.  The portable kernel code
129is divided into two parts: that implementing kernel
130functions and that devoted to the boot process.
131Code in the first class is stored in directory
132.CW /sys/src/9/port
133and the portable boot code is stored in
134.CW /sys/src/9/boot .
135Architecture-dependent kernel code is stored in the
136subdirectories of
137.CW /sys/src/9
138named for each architecture.
139.LP
140The relationship between the kernel code and the boot code
141is convoluted and subtle.  The portable boot code
142is compiled into a library for each architecture.  An architecture-specific
143main program is loaded with the appropriate library and the resulting
144executable is compiled into the kernel where it is executed as
145a user process during the final stages of kernel initialization.  The boot process
146performs authentication, attaches the name space root to the appropriate
147file system and starts the
148.CW init
149process.
150.LP
151The organization of the portable kernel source code differs from that
152of most other architecture-specific code.
153Instead of storing the portable code in a library
154and loading it with the architecture-specific
155code, the portable code is compiled directly into
156the directory containing the architecture-specific code
157and linked with the object files built from the source in that directory.
158.LP
159.SH
160Compilers and Loaders
161.LP
162The compiler source code conforms to the usual
163organization: portable code is compiled into a library
164for each architecture
165and the architecture-dependent code is loaded with
166that library.
167The common compiler code is stored in
168.CW /sys/src/cmd/cc .
169The
170.CW mkfile
171in this directory compiles the portable source and
172archives the objects in a library for each architecture.
173The architecture-specific compiler source
174is stored in a subdirectory of
175.CW /sys/src/cmd
176with the same name as the compiler (e.g.,
177.CW /sys/src/cmd/vc ).
178.LP
179There is no portable code shared by the loaders.
180Each directory of loader source
181code is self-contained, except for
182a header file and an instruction name table
183included from the
184directory of the associated
185compiler.
186.LP
187.SH
188Libraries
189.LP
190Most C library modules are
191portable; the source code is stored in
192directories
193.CW /sys/src/libc/port
194and
195.CW /sys/src/libc/9sys .
196Architecture-dependent library code
197is stored in the subdirectory of
198.CW /sys/src/libc
199named the same as the target processor.
200Non-portable functions not only
201implement architecture-dependent operations
202but also supply assembly language implementations
203of functions where speed is critical.
204Directory
205.CW /sys/src/libc/9syscall
206is unusual because it
207contains architecture-dependent information
208for all architectures.
209It holds only a header file defining
210the names and numbers of system calls
211and a
212.CW mkfile .
213The
214.CW mkfile
215executes an
216.CW rc
217script that parses the header file, constructs
218assembler language functions implementing the system
219call for each architecture, assembles the code,
220and archives the object files in
221.CW libc .
222The assembler language syntax and the system interface
223differ for each architecture.
224The
225.CW rc
226script in this
227.CW mkfile
228must be modified to support a new architecture.
229.LP
230.SH
231Applications
232.LP
233Application programs process two forms of architecture-dependent
234information: executable images and intermediate object files.
235Almost all processing is on executable files.
236System library
237.CW libmach
238provides functions that convert
239architecture-specific data
240to a portable format so application programs
241can process this data independent of its
242underlying representation.
243Further, when a new architecture is implemented
244almost all code changes
245are confined to the library;
246most affected application programs need only be reloaded.
247The source code for the library is stored in
248.CW /sys/src/libmach .
249.LP
250An application program running on one type of
251processor must be able to interpret
252architecture-dependent information for all
253supported processors.
254For example, a debugger must be able to debug
255the executables of
256all architectures, not just the
257architecture on which it is executing, since
258.CW /proc
259may be imported from a different machine.
260.LP
261A small part of the application library
262provides functions to
263extract symbol references from object files.
264The remainder provides the following processing
265of executable files or memory images:
266.RS
267.LP
268.IP \(bu
269Header interpretation.
270.IP \(bu
271Symbol table interpretation.
272.IP \(bu
273Execution context interpretation, such as stack traces
274and stack frame location.
275.IP \(bu
276Instruction interpretation including disassembly and
277instruction size and follow-set calculations.
278.IP \(bu
279Exception and floating point number interpretation.
280.IP \(bu
281Architecture-independent read and write access through a
282relocation map.
283.RE
284.LP
285Header file
286.CW /sys/include/mach.h
287defines the interfaces to the
288application library.  Manual pages
289.I mach (2),
290.I symbol (2),
291and
292.I object (2)
293describe the details of the
294library functions.
295.LP
296Two data structures, called
297.CW Mach
298and
299.CW Machdata ,
300contain architecture-dependent  parameters and
301a jump table of functions.
302Global variables
303.CW mach
304and
305.CW machdata
306point to the
307.CW Mach
308and
309.CW Machdata
310data structures associated with the target architecture.
311An application determines the target architecture of
312a file or executable image, sets the global pointers
313to the data structures associated with that architecture,
314and subsequently performs all references indirectly through the
315pointers.
316As a result, direct references to the tables for each
317architecture are avoided and the application code intrinsically
318supports all architectures (though only one at a time).
319.LP
320Object file processing is handled similarly: architecture-dependent
321functions identify and
322decode the intermediate files for the processor.
323The application indirectly
324invokes a classification function to identify
325the architecture of the object code and to select the
326appropriate decoding function.  Subsequent calls
327then use that function to decode each record.  Again,
328the layer of indirection allows the application code
329to support all architectures without modification.
330.LP
331Splitting the architecture-dependent information
332between the
333.CW Mach
334and
335.CW Machdata
336data structures
337allows applications to choose
338an appropriate level of service.  Even though an application
339does not directly reference the architecture-specific data structures,
340it must load the
341architecture-dependent tables and code
342for all architectures it supports.  The size of this data
343can be substantial and many applications do not require
344the full range of architecture-dependent functionality.
345For example, the
346.CW size
347command does not require the disassemblers for every architecture;
348it only needs to decode the header.
349The
350.CW Mach
351data structure contains a few architecture-specific parameters
352and a description of the processor register set.
353The size of the structure
354varies with the size of the register
355set but is generally small.
356The
357.CW Machdata
358data structure contains
359a jump table of architecture-dependent functions;
360the amount of code and data referenced by this table
361is usually large.
362.SH
363Libmach Source Code Organization
364.LP
365The
366.CW libmach
367library provides four classes of functionality:
368.LP
369.IP "Header and Symbol Table Decoding\ -\ "
370Files
371.CW executable.c
372and
373.CW sym.c
374contain code to interpret the header and
375symbol tables of
376an executable file or executing image.
377Function
378.CW crackhdr
379decodes the header,
380reformats the
381information into an
382.CW Fhdr
383data structure, and points
384global variable
385.CW mach
386to the
387.CW Mach
388data structure of the target architecture.
389The symbol table processing
390uses the data in the
391.CW Fhdr
392structure to decode the symbol table.
393A variety of symbol table access functions then support
394queries on the reformatted table.
395.IP "Debugger Support\ -\ "
396Files named
397.CW \fIm\fP.c ,
398where
399.I m
400is the code letter assigned to the architecture,
401contain the initialized
402.CW Mach
403data structure and the definition of the register
404set for each architecture.
405Architecture-specific debugger support functions and
406an initialized
407.CW Machdata
408structure are stored in
409files named
410.CW \fIm\fPdb.c .
411Files
412.CW machdata.c
413and
414.CW setmach.c
415contain debugger support functions shared
416by multiple architectures.
417.IP "Architecture-Independent Access\ -\ "
418Files
419.CW map.c ,
420.CW access.c ,
421and
422.CW swap.c
423provide accesses through a relocation map
424to data in an executable file or executing image.
425Byte-swapping is performed as needed.  Global variables
426.CW mach
427and
428.CW machdata
429must point to the
430.CW Mach
431and
432.CW Machdata
433data structures of the target architecture.
434.IP "Object File Interpretation\ -\ "
435These files contain functions to identify the
436target architecture of an
437intermediate object file
438and extract references to symbols.  File
439.CW obj.c
440contains code common to all architectures;
441file
442.CW \fIm\fPobj.c
443contains the architecture-specific source code
444for the machine with code character
445.I m .
446.LP
447The
448.CW Machdata
449data structure is primarily a jump
450table of architecture-dependent debugger support
451functions. Functions select the
452.CW Machdata
453structure for a target architecture based
454on the value of the
455.CW type
456code in the
457.CW Fhdr
458structure or the name of the architecture.
459The jump table provides functions to swap bytes, interpret
460machine instructions,
461perform stack
462traces, find stack frames, format floating point
463numbers, and decode machine exceptions.  Some functions, such as
464machine exception decoding, are idiosyncratic and must be
465supplied for each architecture.  Others depend
466on the compiler run-time model and several
467architectures may share code common to a model.  For
468example, many architectures share the code to
469process the fixed-frame stack model implemented by
470several of the compilers.
471Finally, some
472functions, such as byte-swapping, provide a general capability and
473the jump table need only select an implementation appropriate
474to the architecture.
475.LP
476.SH
477Adding Application Support for a New Architecture
478.LP
479This section describes the
480steps required to add application-level
481support for a new architecture.
482We assume
483the kernel, compilers, loaders and system libraries
484for the new architecture are already in place.  This
485implies that a code-character has been assigned and
486that the architecture-specific headers have been
487updated.
488With the exception of two programs,
489application-level changes are confined to header
490files and the source code in
491.CW /sys/src/libmach .
492.LP
493.IP 1.
494Begin by updating the application library
495header file in
496.CW /sys/include/mach.h .
497Add the following symbolic codes to the
498.CW enum
499statement near the beginning of the file:
500.RS
501.IP \(bu
502The processor type code, e.g.,
503.CW MSPARC .
504.IP \(bu
505The type of the executable.  There are usually
506two codes needed: one for a bootable
507executable (i.e., a kernel) and one for an
508application executable.
509.IP \(bu
510The disassembler type code.  Add one entry for
511each supported disassembler for the architecture.
512.IP \(bu
513A symbolic code for the object file.
514.RE
515.LP
516.IP 2.
517In a file name
518.CW /sys/src/libmach/\fIm\fP.c
519(where
520.I m
521is the identifier character assigned to the architecture),
522initialize
523.CW Reglist
524and
525.CW Mach
526data structures with values defining
527the register set and various system parameters.
528The source file for a similar architecture
529can serve as template.
530Most of the fields of the
531.CW Mach
532data structure are obvious
533but a few require further explanation.
534.RS
535.IP "\f(CWkbase\fP\ -\ "
536This field
537contains the address of the kernel
538.CW ublock .
539The debuggers
540assume the first entry of the kernel
541.CW ublock
542points to the
543.CW Proc
544structure for a kernel thread.
545.IP "\f(CWktmask\fP\ -\ "
546This field
547is a bit mask used to calculate the kernel text address from
548the kernel
549.CW ublock
550address.
551The first page of the
552kernel text segment is calculated by
553ANDing
554the negation of this mask with
555.CW kbase .
556.IP "\f(CWkspoff\fP\ -\ "
557This field
558contains the byte offset in the
559.CW Proc
560data structure to the saved kernel
561stack pointer for a suspended kernel thread.  This
562is the offset to the
563.CW sched.sp
564field of a
565.CW Proc
566table entry.
567.IP "\f(CWkpcoff\fP\ -\ "
568This field contains the byte offset into the
569.CW Proc
570data structure
571of
572the program counter of a suspended kernel thread.
573This is the offset to
574field
575.CW sched.pc
576in that structure.
577.IP "\f(CWkspdelta\fP and \f(CWkpcdelta\fP\ -\ "
578These fields
579contain corrections to be added to
580the stack pointer and program counter, respectively,
581to properly locate the stack and next
582instruction of a kernel thread.  These
583values bias the saved registers retrieved
584from the
585.CW Label
586structure named
587.CW sched
588in the
589.CW Proc
590data structure.
591Most architectures require no bias
592and these fields contain zeros.
593.IP "\f(CWscalloff\fP\ -\ "
594This field
595contains the byte offset of the
596.CW scallnr
597field in the
598.CW ublock
599data structure associated with a process.
600The
601.CW scallnr
602field contains the number of the
603last system call executed by the process.
604The location of the field varies depending on
605the size of the floating point register set
606which precedes it in the
607.CW ublock .
608.RE
609.LP
610.IP 3.
611Add an entry to the initialization of the
612.CW ExecTable
613data structure at the beginning of file
614.CW /sys/src/libmach/executable.c .
615Most architectures
616require two entries: one for
617a normal executable and
618one for a bootable
619image.  Each table entry contains:
620.RS
621.IP \(bu
622Magic Number\ \-\
623The big-endian magic number assigned to the architecture in
624.CW /sys/include/a.out.h .
625.IP \(bu
626Name\ \-\
627A string describing the executable.
628.IP \(bu
629Executable type code\ \-\
630The executable code assigned in
631.CW /sys/include/mach.h .
632.IP \(bu
633\f(CWMach\fP pointer\ \-\
634The address of the initialized
635.CW Mach
636data structure constructed in Step 2.
637You must also add the name of this table to the
638list of
639.CW Mach
640table definitions immediately preceding the
641.CW ExecTable
642initialization.
643.IP \(bu
644Header size\ \-\
645The number of bytes in the executable file header.
646The size of a normal executable header is always
647.CW sizeof(Exec) .
648The size of a bootable header is
649determined by the size of the structure
650for the architecture defined in
651.CW /sys/include/bootexec.h .
652.IP \(bu
653Byte-swapping function\ \-\
654The address of
655.CW beswal
656or
657.CW leswal
658for big-endian and little-endian
659architectures, respectively.
660.IP \(bu
661Decoder function\ -\
662The address of a function to decode the header.
663Function
664.CW adotout
665decodes the common header shared by all normal
666(i.e., non-bootable) executable files.
667The header format of bootable
668executable files is defined by the manufacturer and
669a custom function is almost always
670required to decode it.
671Header file
672.CW /sys/include/bootexec.h
673contains data structures defining the bootable
674headers for all architectures.  If the new architecture
675uses an existing format, the appropriate
676decoding function should already be in
677.CW executable.c .
678If the header format is unique, then
679a new function must be added to this file.
680Usually the decoding function for an existing
681architecture can be adopted with minor modifications.
682.RE
683.LP
684.IP 4.
685Write an object file parser and
686store it in file
687.CW /sys/src/libmach/\fIm\fPobj.c
688where
689.I m
690is the identifier character assigned to the architecture.
691Two functions are required: a predicate to identify an
692object file for the architecture and a function to extract
693symbol references from the object code.
694The object code format is obscure but
695it is often possible to adopt the
696code of an existing architecture
697with minor modifications.
698When these
699functions are in hand, insert their addresses
700in the jump table at the beginning of file
701.CW /sys/src/libmach/obj.c .
702.LP
703.IP 5.
704Implement the required debugger support functions and
705initialize the parameters and jump table of the
706.CW Machdata
707data structure for the architecture.
708This code is conventionally stored in
709a file named
710.CW /sys/src/libmach/\fIm\fPdb.c
711where
712.I m
713is the identifier character assigned to the architecture.
714The fields of the
715.CW Machdata
716structure are:
717.RS
718.IP "\f(CWbpinst\fP and \f(CWbpsize\fP\ -\ "
719These fields
720contain the breakpoint instruction and the size
721of the instruction, respectively.
722.IP "\f(CWswab\fP\ -\ "
723This field
724contains the address of a function to
725byte-swap a 16-bit value.  Choose
726.CW leswab
727or
728.CW beswab
729for little-endian or big-endian architectures, respectively.
730.IP "\f(CWswal\fP\ -\ "
731This field
732contains the address of a function to
733byte-swap a 32-bit value.  Choose
734.CW leswal
735or
736.CW beswal
737for little-endian or big-endian architectures, respectively.
738.IP "\f(CWctrace\fP\ -\ "
739This field
740contains the address of a function to perform a
741C-language stack trace.  Two general trace functions,
742.CW risctrace
743and
744.CW cisctrace ,
745traverse fixed-frame and relative-frame stacks,
746respectively.  If the compiler for the
747new architecture conforms to one of
748these models, select the appropriate function.  If the
749stack model is unique,
750supply a custom stack trace function.
751.IP "\f(CWfindframe\fP\ -\ "
752This field
753contains the address of a function to locate the stack
754frame associated with a text address.
755Generic functions
756.CW riscframe
757and
758.CW ciscframe
759process fixed-frame and relative-frame stack
760models.
761.IP "\f(CWufixup\fP\ -\ "
762This field
763contains the address of a function to adjust
764the base address of the register save area.
765Currently, only the
76668020 requires this bias
767to offset over the active
768exception frame.
769.IP "\f(CWexcep\fP\ -\ "
770This field
771contains the address of a function to produce a
772text
773string describing the
774current exception.
775Each architecture stores exception
776information uniquely, so this code must always be supplied.
777.IP "\f(CWbpfix\fP\ -\ "
778This field
779contains the address of a function to adjust an
780address prior to laying down a breakpoint.
781.IP "\f(CWsftos\fP\ -\ "
782This field
783contains the address of a function to convert a single
784precision floating point value
785to a string.  Choose
786.CW leieeesftos
787for little-endian
788or
789.CW beieeesftos
790for big-endian architectures.
791.IP "\f(CWdftos\fP\ -\ "
792This field
793contains the address of a function to convert a double
794precision floating point value
795to a string.  Choose
796.CW leieeedftos
797for little-endian
798or
799.CW beieeedftos
800for big-endian architectures.
801.IP "\f(CWfoll\fP, \f(CWdas\fP, \f(CWhexinst\fP, and \f(CWinstsize\fP\ -\ "
802These fields point to functions that interpret machine
803instructions.
804They rely on disassembly of the instruction
805and are unique to each architecture.
806.CW Foll
807calculates the follow set of an instruction.
808.CW Das
809disassembles a machine instruction to assembly language.
810.CW Hexinst
811formats a machine instruction as a text
812string of
813hexadecimal digits.
814.CW Instsize
815calculates the size in bytes, of an instruction.
816Once the disassembler is written, the other functions
817can usually be implemented as trivial extensions of it.
818.LP
819It is possible to provide support for a new architecture
820incrementally by filling the jump table entries
821of the
822.CW Machdata
823structure as code is written.  In general, if
824a jump table entry contains a zero, application
825programs requiring that function will issue an
826error message instead of attempting to
827call the function.  For example,
828the
829.CW foll ,
830.CW das ,
831.CW hexinst ,
832and
833.CW instsize
834jump table slots can be zeroed until a
835disassembler is written.
836Other capabilities, such as
837stack trace or variable inspection,
838can be supplied and will be available to
839the debuggers but attempts to use the
840disassembler will result in an error message.
841.RE
842.IP 6.
843Update the table named
844.CW machines
845near the beginning of
846.CW /sys/src/libmach/setmach.c .
847This table binds the
848file type code and machine name to the
849.CW Mach
850and
851.CW Machdata
852structures of an architecture.
853The names of the initialized
854.CW Mach
855and
856.CW Machdata
857structures built in steps 2 and 5
858must be added to the list of
859structure definitions immediately
860preceding the table initialization.
861If both Plan 9 and
862native disassembly are supported, add
863an entry for each disassembler to the table.  The
864entry for the default disassembler (usually
865Plan 9) must be first.
866.IP 7.
867Add an entry describing the architecture to
868the table named
869.CW trans
870near the end of
871.CW /sys/src/cmd/prof.c .
872.RE
873.IP 8.
874Add an entry describing the architecture to
875the table named
876.CW objtype
877near the start of
878.CW /sys/src/cmd/pcc.c .
879.RE
880.IP 9.
881Recompile and install
882all application programs that include header file
883.CW mach.h
884and load with
885.CW libmach.a .
886