xref: /plan9-contrib/sys/doc/asm.ms (revision b94bb474148e9d24a82a427863d9c9eb4c20f4ae)
1.HTML "A Manual for the Plan 9 assembler
2.ft CW
3.ta 8n +8n +8n +8n +8n +8n +8n
4.ft
5.TL
6A Manual for the Plan 9 assembler
7.AU
8Rob Pike
9rob@plan9.bell-labs.com
10.SH
11Machines
12.PP
13There is an assembler for each of the MIPS, SPARC, Intel 386,
14Intel 960, AMD 29000, Motorola 68020 and 68000, Motorola Power PC,
15AMD64, DEC Alpha, and Acorn ARM.
16The 68020 assembler,
17.CW 2a ,
18is the oldest and in many ways the prototype.
19The assemblers are really just variations of a single program:
20they share many properties such as left-to-right assignment order for
21instruction operands and the synthesis of macro instructions
22such as
23.CW MOVE
24to hide the peculiarities of the load and store structure of the machines.
25To keep things concrete, the first part of this manual is
26specifically about the 68020.
27At the end is a description of the differences among
28the other assemblers.
29.PP
30The document, ``How to Use the Plan 9 C Compiler'', by Rob Pike,
31is a prerequisite for this manual.
32.SH
33Registers
34.PP
35All pre-defined symbols in the assembler are upper-case.
36Data registers are
37.CW R0
38through
39.CW R7 ;
40address registers are
41.CW A0
42through
43.CW A7 ;
44floating-point registers are
45.CW F0
46through
47.CW F7 .
48.PP
49A pointer in
50.CW A6
51is used by the C compiler to point to data, enabling short addresses to
52be used more often.
53The value of
54.CW A6
55is constant and must be set during C program initialization
56to the address of the externally-defined symbol
57.CW a6base .
58.PP
59The following hardware registers are defined in the assembler; their
60meaning should be obvious given a 68020 manual:
61.CW CAAR ,
62.CW CACR ,
63.CW CCR ,
64.CW DFC ,
65.CW ISP ,
66.CW MSP ,
67.CW SFC ,
68.CW SR ,
69.CW USP ,
70and
71.CW VBR .
72.PP
73The assembler also defines several pseudo-registers that
74manipulate the stack:
75.CW FP ,
76.CW SP ,
77and
78.CW TOS .
79.CW FP
80is the frame pointer, so
81.CW 0(FP)
82is the first argument,
83.CW 4(FP)
84is the second, and so on.
85.CW SP
86is the local stack pointer, where automatic variables are held
87(SP is a pseudo-register only on the 68020);
88.CW 0(SP)
89is the first automatic, and so on as with
90.CW FP .
91Finally,
92.CW TOS
93is the top-of-stack register, used for pushing parameters to procedures,
94saving temporary values, and so on.
95.PP
96The assembler and loader track these pseudo-registers so
97the above statements are true regardless of what has been
98pushed on the hardware stack, pointed to by
99.CW A7 .
100The name
101.CW A7
102refers to the hardware stack pointer, but beware of mixed use of
103.CW A7
104and the above stack-related pseudo-registers, which will cause trouble.
105Note, too, that the
106.CW PEA
107instruction is observed by the loader to
108alter SP and thus will insert a corresponding pop before all returns.
109The assembler accepts a label-like name to be attached to
110.CW FP
111and
112.CW SP
113uses, such as
114.CW p+0(FP) ,
115to help document that
116.CW p
117is the first argument to a routine.
118The name goes in the symbol table but has no significance to the result
119of the program.
120.SH
121Referring to data
122.PP
123All external references must be made relative to some pseudo-register,
124either
125.CW PC
126(the virtual program counter) or
127.CW SB
128(the ``static base'' register).
129.CW PC
130counts instructions, not bytes of data.
131For example, to branch to the second following instruction, that is,
132to skip one instruction, one may write
133.P1
134	BRA	2(PC)
135.P2
136Labels are also allowed, as in
137.P1
138	BRA	return
139	NOP
140return:
141	RTS
142.P2
143When using labels, there is no
144.CW (PC)
145annotation.
146.PP
147The pseudo-register
148.CW SB
149refers to the beginning of the address space of the program.
150Thus, references to global data and procedures are written as
151offsets to
152.CW SB ,
153as in
154.P1
155	MOVL	$array(SB), TOS
156.P2
157to push the address of a global array on the stack, or
158.P1
159	MOVL	array+4(SB), TOS
160.P2
161to push the second (4-byte) element of the array.
162Note the use of an offset; the complete list of addressing modes is given below.
163Similarly, subroutine calls must use
164.CW SB :
165.P1
166	BSR	exit(SB)
167.P2
168File-static variables have syntax
169.P1
170	local<>+4(SB)
171.P2
172The
173.CW <>
174will be filled in at load time by a unique integer.
175.PP
176When a program starts, it must execute
177.P1
178	MOVL	$a6base(SB), A6
179.P2
180before accessing any global data.
181(On machines such as the MIPS and SPARC that cannot load a register
182in a single instruction, constants are loaded through the static base
183register.  The loader recognizes code that initializes the static
184base register and treats it specially.  You must be careful, however,
185not to load large constants on such machines when the static base
186register is not set up, such as early in interrupt routines.)
187.SH
188Expressions
189.PP
190Expressions are mostly what one might expect.
191Where an offset or a constant is expected,
192a primary expression with unary operators is allowed.
193A general C constant expression is allowed in parentheses.
194.PP
195Source files are preprocessed exactly as in the C compiler, so
196.CW #define
197and
198.CW #include
199work.
200.SH
201Addressing modes
202.PP
203The simple addressing modes are shared by all the assemblers.
204Here, for completeness, follows a table of all the 68020 addressing modes,
205since that machine has the richest set.
206In the table,
207.CW o
208is an offset, which if zero may be elided, and
209.CW d
210is a displacement, which is a constant between -128 and 127 inclusive.
211Many of the modes listed have the same name;
212scrutiny of the format will show what default is being applied.
213For instance, indexed mode with no address register supplied operates
214as though a zero-valued register were used.
215For "offset" read "displacement."
216For "\f(CW.s\fP" read one of
217.CW .L ,
218or
219.CW .W
220followed by
221.CW *1 ,
222.CW *2 ,
223.CW *4 ,
224or
225.CW *8
226to indicate the size and scaling of the data.
227.IP
228.TS
229l lfCW.
230data register	R0
231address register	A0
232floating-point register	F0
233special names	CAAR, CACR, etc.
234constant	$con
235floating point constant	$fcon
236external symbol	name+o(SB)
237local symbol	name<>+o(SB)
238automatic symbol	name+o(SP)
239argument	name+o(FP)
240address of external	$name+o(SB)
241address of local	$name<>+o(SB)
242indirect post-increment	(A0)+
243indirect pre-decrement	-(A0)
244indirect with offset	o(A0)
245indexed with offset	o()(R0.s)
246indexed with offset	o(A0)(R0.s)
247external indexed	name+o(SB)(R0.s)
248local indexed	name<>+o(SB)(R0.s)
249automatic indexed	name+o(SP)(R0.s)
250parameter indexed	name+o(FP)(R0.s)
251offset indirect post-indexed	d(o())(R0.s)
252offset indirect post-indexed	d(o(A0))(R0.s)
253external indirect post-indexed	d(name+o(SB))(R0.s)
254local indirect post-indexed	d(name<>+o(SB))(R0.s)
255automatic indirect post-indexed	d(name+o(SP))(R0.s)
256parameter indirect post-indexed	d(name+o(FP))(R0.s)
257offset indirect pre-indexed	d(o()(R0.s))
258offset indirect pre-indexed	d(o(A0))
259offset indirect pre-indexed	d(o(A0)(R0.s))
260external indirect pre-indexed	d(name+o(SB))
261external indirect pre-indexed	d(name+o(SB)(R0.s))
262local indirect pre-indexed	d(name<>+o(SB))
263local indirect pre-indexed	d(name<>+o(SB)(R0.s))
264automatic indirect pre-indexed	d(name+o(SP))
265automatic indirect pre-indexed	d(name+o(SP)(R0.s))
266parameter indirect pre-indexed	d(name+o(FP))
267parameter indirect pre-indexed	d(name+o(FP)(R0.s))
268.TE
269.in
270.SH
271Laying down data
272.PP
273Placing data in the instruction stream, say for interrupt vectors, is easy:
274the pseudo-instructions
275.CW LONG
276and
277.CW WORD
278(but not
279.CW BYTE )
280lay down the value of their single argument, of the appropriate size,
281as if it were an instruction:
282.P1
283	LONG	$12345
284.P2
285places the long 12345 (base 10)
286in the instruction stream.
287(On most machines,
288the only such operator is
289.CW WORD
290and it lays down 32-bit quantities.
291The 386 has all three:
292.CW LONG ,
293.CW WORD ,
294and
295.CW BYTE .
296The AMD64 adds
297.CW QUAD
298to that for 64-bit values.
299The 960 has only one,
300.CW LONG .)
301.PP
302Placing information in the data section is more painful.
303The pseudo-instruction
304.CW DATA
305does the work, given two arguments: an address at which to place the item,
306including its size,
307and the value to place there.  For example, to define a character array
308.CW array
309containing the characters
310.CW abc
311and a terminating null:
312.P1
313	DATA    array+0(SB)/1, $'a'
314	DATA    array+1(SB)/1, $'b'
315	DATA    array+2(SB)/1, $'c'
316	GLOBL   array(SB), $4
317.P2
318or
319.P1
320	DATA    array+0(SB)/4, $"abc\ez"
321	GLOBL   array(SB), $4
322.P2
323The
324.CW /1
325defines the number of bytes to define,
326.CW GLOBL
327makes the symbol global, and the
328.CW $4
329says how many bytes the symbol occupies.
330Uninitialized data is zeroed automatically.
331The character
332.CW \ez
333is equivalent to the C
334.CW \e0.
335The string in a
336.CW DATA
337statement may contain a maximum of eight bytes;
338build larger strings piecewise.
339Two pseudo-instructions,
340.CW DYNT
341and
342.CW INIT ,
343allow the (obsolete) Alef compilers to build dynamic type information during the load
344phase.
345The
346.CW DYNT
347pseudo-instruction has two forms:
348.P1
349	DYNT	, ALEF_SI_5+0(SB)
350	DYNT	ALEF_AS+0(SB), ALEF_SI_5+0(SB)
351.P2
352In the first form,
353.CW DYNT
354defines the symbol to be a small unique integer constant, chosen by the loader,
355which is some multiple of the word size.  In the second form,
356.CW DYNT
357defines the second symbol in the same way,
358places the address of the most recently
359defined text symbol in the array specified by the first symbol at the
360index defined by the value of the second symbol,
361and then adjusts the size of the array accordingly.
362.PP
363The
364.CW INIT
365pseudo-instruction takes the same parameters as a
366.CW DATA
367statement.  Its symbol is used as the base of an array and the
368data item is installed in the array at the offset specified by the most recent
369.CW DYNT
370pseudo-instruction.
371The size of the array is adjusted accordingly.
372The
373.CW DYNT
374and
375.CW INIT
376pseudo-instructions are not implemented on the 68020.
377.SH
378Defining a procedure
379.PP
380Entry points are defined by the pseudo-operation
381.CW TEXT ,
382which takes as arguments the name of the procedure (including the ubiquitous
383.CW (SB) )
384and the number of bytes of automatic storage to pre-allocate on the stack,
385which will usually be zero when writing assembly language programs.
386On machines with a link register, such as the MIPS and SPARC,
387the special value -4 instructs the loader to generate no PC save
388and restore instructions, even if the function is not a leaf.
389Here is a complete procedure that returns the sum
390of its two arguments:
391.P1
392TEXT	sum(SB), $0
393	MOVL	arg1+0(FP), R0
394	ADDL	arg2+4(FP), R0
395	RTS
396.P2
397An optional middle argument
398to the
399.CW TEXT
400pseudo-op is a bit field of options to the loader.
401Setting the 1 bit suspends profiling the function when profiling is enabled for the rest of
402the program.
403For example,
404.P1
405TEXT	sum(SB), 1, $0
406	MOVL	arg1+0(FP), R0
407	ADDL	arg2+4(FP), R0
408	RTS
409.P2
410will not be profiled; the first version above would be.
411Subroutines with peculiar state, such as system call routines,
412should not be profiled.
413.PP
414Setting the 2 bit allows multiple definitions of the same
415.CW TEXT
416symbol in a program; the loader will place only one such function in the image.
417It was emitted only by the Alef compilers.
418.PP
419Subroutines to be called from C should place their result in
420.CW R0 ,
421even if it is an address.
422Floating point values are returned in
423.CW F0 .
424Functions that return a structure to a C program
425receive as their first argument the address of the location to
426store the result;
427.CW R0
428is unused in the calling protocol for such procedures.
429A subroutine is responsible for saving its own registers,
430and therefore is free to use any registers without saving them (``caller saves'').
431.CW A6
432and
433.CW A7
434are the exceptions as described above.
435.SH
436When in doubt
437.PP
438If you get confused, try using the
439.CW -S
440option to
441.CW 2c
442and compiling a sample program.
443The standard output is valid input to the assembler.
444.SH
445Instructions
446.PP
447The instruction set of the assembler is not identical to that
448of the machine.
449It is chosen to match what the compiler generates, augmented
450slightly by specific needs of the operating system.
451For example,
452.CW 2a
453does not distinguish between the various forms of
454.CW MOVE
455instruction: move quick, move address, etc.  Instead the context
456does the job.  For example,
457.P1
458	MOVL	$1, R1
459	MOVL	A0, R2
460	MOVW	SR, R3
461.P2
462generates official
463.CW MOVEQ ,
464.CW MOVEA ,
465and
466.CW MOVESR
467instructions.
468A number of instructions do not have the syntax necessary to specify
469their entire capabilities.  Notable examples are the bitfield
470instructions, the
471multiply and divide instructions, etc.
472For a complete set of generated instruction names (in
473.CW 2a
474notation, not Motorola's) see the file
475.CW /sys/src/cmd/2c/2.out.h .
476Despite its name, this file contains an enumeration of the
477instructions that appear in the intermediate files generated
478by the compiler, which correspond exactly to lines of assembly language.
479.PP
480The MC68000 assembler,
481.CW 1a ,
482is essentially the same, honoring the appropriate subset of the instructions
483and addressing modes.
484The definitions of these are, nonetheless, part of
485.CW 2.out.h .
486.SH
487Laying down instructions
488.PP
489The loader modifies the code produced by the assembler and compiler.
490It folds branches,
491copies short sequences of code to eliminate branches,
492and discards unreachable code.
493The first instruction of every function is assumed to be reachable.
494The pseudo-instruction
495.CW NOP ,
496which you may see in compiler output,
497means no instruction at all, rather than an instruction that does nothing.
498The loader discards all
499.CW NOP 's.
500.PP
501To generate a true
502.CW NOP
503instruction, or any other instruction not known to the assembler, use a
504.CW WORD
505pseudo-instruction.
506Such instructions on RISCs are not scheduled by the loader and must have
507their delay slots filled manually.
508.SH
509MIPS
510.PP
511The registers are only addressed by number:
512.CW R0
513through
514.CW R31 .
515.CW R29
516is the stack pointer;
517.CW R30
518is used as the static base pointer, the analogue of
519.CW A6
520on the 68020.
521Its value is the address of the global symbol
522.CW setR30(SB) .
523The register holding returned values from subroutines is
524.CW R1 .
525When a function is called, space for the first argument
526is reserved at
527.CW 0(FP)
528but in C (not Alef) the value is passed in
529.CW R1
530instead.
531.PP
532The loader uses
533.CW R28
534as a temporary.  The system uses
535.CW R26
536and
537.CW R27
538as interrupt-time temporaries.  Therefore none of these registers
539should be used in user code.
540.PP
541The control registers are not known to the assembler.
542Instead they are numbered registers
543.CW M0 ,
544.CW M1 ,
545etc.
546Use this trick to access, say,
547.CW STATUS :
548.P1
549#define	STATUS	12
550	MOVW	M(STATUS), R1
551.P2
552.PP
553Floating point registers are called
554.CW F0
555through
556.CW F31 .
557By convention,
558.CW F24
559must be initialized to the value 0.0,
560.CW F26
561to 0.5,
562.CW F28
563to 1.0, and
564.CW F30
565to 2.0;
566this is done by the operating system.
567.PP
568The instructions and their syntax are different from those of the manufacturer's
569manual.
570There are no
571.CW lui
572and kin; instead there are
573.CW MOVW
574(move word),
575.CW MOVH
576(move halfword),
577and
578.CW MOVB
579(move byte) pseudo-instructions.  If the operand is unsigned, the instructions
580are
581.CW MOVHU
582and
583.CW MOVBU .
584The order of operands is from left to right in dataflow order, just as
585on the 68020 but not as in MIPS documentation.
586This means that the
587.CW Bcond
588instructions are reversed with respect to the book; for example, a
589.CW va
590.CW BGTZ
591generates a MIPS
592.CW bltz
593instruction.
594.PP
595The assembler is for the R2000, R3000, and most of the R4000 and R6000 architectures.
596It understands the 64-bit instructions
597.CW MOVV ,
598.CW MOVVL ,
599.CW ADDV ,
600.CW ADDVU ,
601.CW SUBV ,
602.CW SUBVU ,
603.CW MULV ,
604.CW MULVU ,
605.CW DIVV ,
606.CW DIVVU ,
607.CW SLLV ,
608.CW SRLV ,
609and
610.CW SRAV .
611The assembler does not have any cache, load-linked, or store-conditional instructions.
612.PP
613Some assembler instructions are expanded into multiple instructions by the loader.
614For example the loader may convert the load of a 32 bit constant into an
615.CW lui
616followed by an
617.CW ori .
618.PP
619Assembler instructions should be laid out as if there
620were no load, branch, or floating point compare delay slots;
621the loader will rearrange\(em\f2schedule\f1\(emthe instructions
622to guarantee correctness and improve performance.
623The only exception is that the correct scheduling of instructions
624that use control registers varies from model to model of machine
625(and is often undocumented) so you should schedule such instructions
626by hand to guarantee correct behavior.
627The loader generates
628.P1
629	NOR	R0, R0, R0
630.P2
631when it needs a true no-op instruction.
632Use exactly this instruction when scheduling code manually;
633the loader recognizes it and schedules the code before it and after it independently.  Also,
634.CW WORD
635pseudo-ops are scheduled like no-ops.
636.PP
637The
638.CW NOSCHED
639pseudo-op disables instruction scheduling
640(scheduling is enabled by default);
641.CW SCHED
642re-enables it.
643Branch folding, code copying, and dead code elimination are
644disabled for instructions that are not scheduled.
645.SH
646SPARC
647.PP
648Once you understand the Plan 9 model for the MIPS, the SPARC is familiar.
649Registers have numerical names only:
650.CW R0
651through
652.CW R31 .
653Forget about register windows: Plan 9 doesn't use them at all.
654The machine has 32 global registers, period.
655.CW R1
656[sic] is the stack pointer.
657.CW R2
658is the static base register, with value the address of
659.CW setSB(SB) .
660.CW R7
661is the return register and also the register holding the first
662argument to a C (not Alef) function, again with space reserved at
663.CW 0(FP) .
664.CW R14
665is the loader temporary.
666.PP
667Floating-point registers are exactly as on the MIPS.
668.PP
669The control registers are known by names such as
670.CW FSR .
671The instructions to access these registers are
672.CW MOVW
673instructions, for example
674.P1
675	MOVW	Y, R8
676.P2
677for the SPARC instruction
678.P1
679	rdy	%r8
680.P2
681.PP
682Move instructions are similar to those on the MIPS: pseudo-operations
683that turn into appropriate sequences of
684.CW sethi
685instructions, adds, etc.
686Instructions read from left to right.  Because the arguments are
687flipped to
688.CW SUBCC ,
689the condition codes are not inverted as on the MIPS.
690.PP
691The syntax for the ASI stuff is, for example to move a word from ASI 2:
692.P1
693	MOVW	(R7, 2), R8
694.P2
695The syntax for double indexing is
696.P1
697	MOVW	(R7+R8), R9
698.P2
699.PP
700The SPARC's instruction scheduling is similar to the MIPS's.
701The official no-op instruction is:
702.P1
703	ORN	R0, R0, R0
704.P2
705.SH
706i960
707.PP
708Registers are numbered
709.CW R0
710through
711.CW R31 .
712Stack pointer is
713.CW R29 ;
714return register is
715.CW R4 ;
716static base is
717.CW R28 ;
718it is initialized to the address of
719.CW setSB(SB) .
720.CW R3
721must be zero; this should be done manually early in execution by
722.P1
723	SUBO	R3, R3
724.P2
725.CW R27
726is the loader temporary.
727.PP
728There is no support for floating point.
729.PP
730The Intel calling convention is not supported and cannot be used; use
731.CW BAL
732instead.
733Instructions are mostly as in the book.  The major change is that
734.CW LOAD
735and
736.CW STORE
737are both called
738.CW MOV .
739The extension character for
740.CW MOV
741is as in the manual:
742.CW O
743for ordinal,
744.CW W
745for signed, etc.
746.SH
747i386
748.PP
749The assembler assumes 32-bit protected mode.
750The register names are
751.CW SP ,
752.CW AX ,
753.CW BX ,
754.CW CX ,
755.CW DX ,
756.CW BP ,
757.CW DI ,
758and
759.CW SI .
760The stack pointer (not a pseudo-register) is
761.CW SP
762and the return register is
763.CW AX .
764There is no physical frame pointer but, as for the MIPS,
765.CW FP
766is a pseudo-register that acts as
767a frame pointer.
768.PP
769Opcode names are mostly the same as those listed in the Intel manual
770with an
771.CW L ,
772.CW W ,
773or
774.CW B
775appended to identify 32-bit,
77616-bit, and 8-bit operations.
777The exceptions are loads, stores, and conditionals.
778All load and store opcodes to and from general registers, special registers
779(such as
780.CW CR0,
781.CW CR3,
782.CW GDTR,
783.CW IDTR,
784.CW SS,
785.CW CS,
786.CW DS,
787.CW ES,
788.CW FS,
789and
790.CW GS )
791or memory are written
792as
793.P1
794	MOV\f2x\fP	src,dst
795.P2
796where
797.I x
798is
799.CW L ,
800.CW W ,
801or
802.CW B .
803Thus to get
804.CW AL
805use a
806.CW MOVB
807instruction.  If you need to access
808.CW AH ,
809you must mention it explicitly in a
810.CW MOVB :
811.P1
812	MOVB	AH, BX
813.P2
814There are many examples of illegal moves, for example,
815.P1
816	MOVB	BP, DI
817.P2
818that the loader actually implements as pseudo-operations.
819.PP
820The names of conditions in all conditional instructions
821.CW J , (
822.CW SET )
823follow the conventions of the 68020 instead of those of the Intel
824assembler:
825.CW JOS ,
826.CW JOC ,
827.CW JCS ,
828.CW JCC ,
829.CW JEQ ,
830.CW JNE ,
831.CW JLS ,
832.CW JHI ,
833.CW JMI ,
834.CW JPL ,
835.CW JPS ,
836.CW JPC ,
837.CW JLT ,
838.CW JGE ,
839.CW JLE ,
840and
841.CW JGT
842instead of
843.CW JO ,
844.CW JNO ,
845.CW JB ,
846.CW JNB ,
847.CW JZ ,
848.CW JNZ ,
849.CW JBE ,
850.CW JNBE ,
851.CW JS ,
852.CW JNS ,
853.CW JP ,
854.CW JNP ,
855.CW JL ,
856.CW JNL ,
857.CW JLE ,
858and
859.CW JNLE .
860.PP
861The addressing modes have syntax like
862.CW AX ,
863.CW (AX) ,
864.CW (AX)(BX*4) ,
865.CW 10(AX) ,
866and
867.CW 10(AX)(BX*4) .
868The offsets from
869.CW AX
870can be replaced by offsets from
871.CW FP
872or
873.CW SB
874to access names, for example
875.CW extern+5(SB)(AX*2) .
876.PP
877Other notes: Non-relative
878.CW JMP
879and
880.CW CALL
881have a
882.CW *
883added to the syntax.
884Only
885.CW LOOP ,
886.CW LOOPEQ ,
887and
888.CW LOOPNE
889are legal loop instructions.  Only
890.CW REP
891and
892.CW REPN
893are recognized repeaters.  These are not prefixes, but rather
894stand-alone opcodes that precede the strings, for example
895.P1
896	CLD; REP; MOVSL
897.P2
898Segment override prefixes in
899.CW MOD/RM
900fields are not supported.
901.SH
902AMD64
903.PP
904The assembler assumes 64-bit mode unless a
905.CW MODE
906pseudo-operation is given:
907.P1
908	MODE $32
909.P2
910to change to 32-bit mode.
911The effect is mainly to diagnose instructions that are illegal in
912the given mode, but the loader will also assume 32-bit operands and addresses,
913and 32-bit PC values for call and return.
914The assembler's conventions are similar to those for the 386, above.
915The architecture provides extra fixed-point registers
916.CW R8
917to
918.CW R15 .
919All registers are 64 bit, but instructions access low-order 8, 16 and 32 bits
920as described in the processor handbook.
921For example,
922.CW MOVL
923to
924.CW AX
925puts a value in the low-order 32 bits and clears the top 32 bits to zero.
926Literal operands are limited to signed 32 bit values, which are sign-extended
927to 64 bits in 64 bit operations; the exception is
928.CW MOVQ ,
929which allows 64-bit literals.
930The external registers in Plan 9's C are allocated from
931.CW R15
932down.
933.PP
934There are many new instructions, including the MMX and XMM media instructions,
935and conditional move instructions.
936MMX registers are
937.CW M0
938to
939.CW M7 ,
940and
941XMM registers are
942.CW X0
943to
944.CW X15 .
945As with the 386 instruction names,
946all new 64-bit integer instructions, and the MMX and XMM instructions
947uniformly use
948.CW L
949for `long word' (32 bits) and
950.CW Q
951for `quad word' (64 bits).
952Some instructions use
953.CW O
954(`octword') for 128-bit values, where the processor handbook
955variously uses
956.CW O
957or
958.CW DQ .
959The assembler also consistently uses
960.CW PL
961for `packed long' in
962XMM instructions, instead of
963.CW Q ,
964.CW DQ
965or
966.CW PI .
967Either
968.CW MOVL
969or
970.CW MOVQ
971can be used to move values to and from control registers, even when
972the registers might be 64 bits.
973The assembler often accepts the handbook's name to ease conversion
974of existing code (but remember that the operand order is uniformly
975source then destination).
976.PP
977C's
978.CW long
979.CW long
980type is 64 bits, but passed and returned by value, not by reference.
981More notably, C pointer values are 64 bits, and thus
982.CW long
983.CW long
984and
985.CW unsigned
986.CW long
987.CW long
988are the only integer types wide enough to hold a pointer value.
989The C compiler and library use the XMM floating-point instructions, not
990the old 387 ones, although the latter are implemented by assembler and loader.
991Unlike the 386, the first integer or pointer argument is passed in a register, which is
992.CW BP
993for an integer or pointer (it can be referred to in assembly code by the pseudonym
994.CW RARG ).
995.CW AX
996holds the return value from subroutines as before.
997Floating-point results are returned in
998.CW X0 ,
999although currently the first floating-point parameter is not passed in a register.
1000All parameters less than 8 bytes in length have 8 byte slots reserved on the stack
1001to preserve alignment and simplify variable-length argument list access,
1002including the first parameter when passed in a register,
1003even though bytes 4 to 7 are not initialized.
1004.SH
1005Alpha
1006.PP
1007On the Alpha, all registers are 64 bits.  The architecture handles 32-bit values
1008by giving them a canonical format (sign extension in the case of integer registers).
1009Registers are numbered
1010.CW R0
1011through
1012.CW R31 .
1013.CW R0
1014holds the return value from subroutines, and also the first parameter.
1015.CW R30
1016is the stack pointer,
1017.CW R29
1018is the static base,
1019.CW R26
1020is the link register, and
1021.CW R27
1022and
1023.CW R28
1024are linker temporaries.
1025.PP
1026Floating point registers are numbered
1027.CW F0
1028to
1029.CW F31 .
1030.CW F28
1031contains
1032.CW 0.5 ,
1033.CW F29
1034contains
1035.CW 1.0 ,
1036and
1037.CW F30
1038contains
1039.CW 2.0 .
1040.CW F31
1041is always
1042.CW 0.0
1043on the Alpha.
1044.PP
1045The extension character for
1046.CW MOV
1047follows DEC's notation:
1048.CW B
1049for byte (8 bits),
1050.CW W
1051for word (16 bits),
1052.CW L
1053for long (32 bits),
1054and
1055.CW Q
1056for quadword (64 bits).
1057Byte and ``word'' loads and stores may be made unsigned
1058by appending a
1059.CW U .
1060.CW S
1061and
1062.CW T
1063refer to IEEE floating point single precision (32 bits) and double precision (64 bits), respectively.
1064.SH
1065Power PC
1066.PP
1067The Power PC follows the Plan 9 model set by the MIPS and SPARC,
1068not the elaborate ABIs.
1069The 32-bit instructions of the 60x and 8xx PowerPC architectures are supported;
1070there is no support for the older POWER instructions.
1071Registers are
1072.CW R0
1073through
1074.CW R31 .
1075.CW R0
1076is initialized to zero; this is done by C start up code
1077and assumed by the compiler and loader.
1078.CW R1
1079is the stack pointer.
1080.CW R2
1081is the static base register, with value the address of
1082.CW setSB(SB) .
1083.CW R3
1084is the return register and also the register holding the first
1085argument to a C function, with space reserved at
1086.CW 0(FP)
1087as on the MIPS.
1088.CW R31
1089is the loader temporary.
1090The external registers in Plan 9's C are allocated from
1091.CW R30
1092down.
1093.PP
1094Floating point registers are called
1095.CW F0
1096through
1097.CW F31 .
1098By convention, several registers are initialized
1099to specific values; this is done by the operating system.
1100.CW F27
1101must be initialized to the value
1102.CW 0x4330000080000000
1103(used by float-to-int conversion),
1104.CW F28
1105to the value 0.0,
1106.CW F29
1107to 0.5,
1108.CW F30
1109to 1.0, and
1110.CW F31
1111to 2.0.
1112.PP
1113As on the MIPS and SPARC, the assembler accepts arbitrary literals
1114as operands to
1115.CW MOVW ,
1116and also to
1117.CW ADD
1118and others where `immediate' variants exist,
1119and the loader generates sequences
1120of
1121.CW addi ,
1122.CW addis ,
1123.CW oris ,
1124etc. as required.
1125The register indirect addressing modes use the same syntax as the SPARC,
1126including double indexing when allowed.
1127.PP
1128The instruction names are generally derived from the Motorola ones,
1129subject to slight transformation:
1130the
1131.CW . ' `
1132marking the setting of condition codes is replaced by
1133.CW CC ,
1134and when the letter
1135.CW o ' `
1136represents `OE=1' it is replaced by
1137.CW V .
1138Thus
1139.CW add ,
1140.CW addo.
1141and
1142.CW subfzeo.
1143become
1144.CW ADD ,
1145.CW ADDVCC
1146and
1147.CW SUBFZEVCC .
1148As well as the three-operand conditional branch instruction
1149.CW BC ,
1150the assembler provides pseudo-instructions for the common cases:
1151.CW BEQ ,
1152.CW BNE ,
1153.CW BGT ,
1154.CW BGE ,
1155.CW BLT ,
1156.CW BLE ,
1157.CW BVC ,
1158and
1159.CW BVS .
1160The unconditional branch instruction is
1161.CW BR .
1162Indirect branches use
1163.CW "(CTR)"
1164or
1165.CW "(LR)"
1166as target.
1167.PP
1168Load or store operations are replaced by
1169.CW MOV
1170variants in the usual way:
1171.CW MOVW
1172(move word),
1173.CW MOVH
1174(move halfword with sign extension), and
1175.CW MOVB
1176(move byte with sign extension, a pseudo-instruction),
1177with unsigned variants
1178.CW MOVHZ
1179and
1180.CW MOVBZ ,
1181and byte-reversing
1182.CW MOVWBR
1183and
1184.CW MOVHBR .
1185`Load or store with update' versions are
1186.CW MOVWU ,
1187.CW MOVHU ,
1188and
1189.CW MOVBZU .
1190Load or store multiple is
1191.CW MOVMW .
1192The exceptions are the string instructions, which are
1193.CW LSW
1194and
1195.CW STSW ,
1196and the reservation instructions
1197.CW lwarx
1198and
1199.CW stwcx. ,
1200which are
1201.CW LWAR
1202and
1203.CW STWCCC ,
1204all with operands in the usual data-flow order.
1205Floating-point load or store instructions are
1206.CW FMOVD ,
1207.CW FMOVDU ,
1208.CW FMOVS ,
1209and
1210.CW FMOVSU .
1211The register to register move instructions
1212.CW fmr
1213and
1214.CW fmr.
1215are written
1216.CW FMOVD
1217and
1218.CW FMOVDCC .
1219.PP
1220The assembler knows the commonly used special purpose registers:
1221.CW CR ,
1222.CW CTR ,
1223.CW DEC ,
1224.CW LR ,
1225.CW MSR ,
1226and
1227.CW XER .
1228The rest, which are often architecture-dependent, are referenced as
1229.CW SPR(n) .
1230The segment registers of the 60x series are similarly
1231.CW SEG(n) ,
1232but
1233.I n
1234can also be a register name, as in
1235.CW SEG(R3) .
1236Moves between special purpose registers and general purpose ones,
1237when allowed by the architecture,
1238are written as
1239.CW MOVW ,
1240replacing
1241.CW mfcr ,
1242.CW mtcr ,
1243.CW mfmsr ,
1244.CW mtmsr ,
1245.CW mtspr ,
1246.CW mfspr ,
1247.CW mftb ,
1248and many others.
1249.PP
1250The fields of the condition register
1251.CW CR
1252are referenced as
1253.CW CR(0)
1254through
1255.CW CR(7) .
1256They are used by the
1257.CW MOVFL
1258(move field) pseudo-instruction,
1259which produces
1260.CW mcrf
1261or
1262.CW mtcrf .
1263For example:
1264.P1
1265	MOVFL	CR(3), CR(0)
1266	MOVFL	R3, CR(1)
1267	MOVFL	R3, $7, CR
1268.P2
1269They are also accepted in
1270the conditional branch instruction, for example
1271.P1
1272	BEQ	CR(7), label
1273.P2
1274Fields of the
1275.CW FPSCR
1276are accessed using
1277.CW MOVFL
1278in a similar way:
1279.P1
1280	MOVFL	FPSCR, F0
1281	MOVFL	F0, FPSCR
1282	MOVFL	F0, $7, FPSCR
1283	MOVFL	$0, FPSCR(3)
1284.P2
1285producing
1286.CW mffs ,
1287.CW mtfsf
1288or
1289.CW mtfsfi ,
1290as appropriate.
1291.SH
1292ARM
1293.PP
1294The assembler provides access to
1295.CW R0
1296through
1297.CW R14
1298and the
1299.CW PC .
1300The stack pointer is
1301.CW R13 ,
1302the link register is
1303.CW R14 ,
1304and the static base register is
1305.CW R12 .
1306.CW R0
1307is the return register and also the register holding
1308the first argument to a subroutine.
1309The external registers in Plan 9's C are allocated from
1310.CW R10
1311down.
1312.CW R11
1313is used by the loader as a temporary register.
1314The assembler supports the
1315.CW CPSR
1316and
1317.CW SPSR
1318registers.
1319It also knows about coprocessor registers
1320.CW C0
1321through
1322.CW C15 .
1323Floating registers are
1324.CW F0
1325through
1326.CW F7 ,
1327.CW FPSR
1328and
1329.CW FPCR .
1330.PP
1331As with the other architectures, loads and stores are called
1332.CW MOV ,
1333e.g.
1334.CW MOVW
1335for load word or store word, and
1336.CW MOVM
1337for
1338load or store multiple,
1339depending on the operands.
1340.PP
1341Addressing modes are supported by suffixes to the instructions:
1342.CW .IA
1343(increment after),
1344.CW .IB
1345(increment before),
1346.CW .DA
1347(decrement after), and
1348.CW .DB
1349(decrement before).
1350These can only be used with the
1351.CW MOV
1352instructions.
1353The move multiple instruction,
1354.CW MOVM ,
1355defines a range of registers using brackets, e.g.
1356.CW [R0-R12] .
1357The special
1358.CW MOVM
1359addressing mode bits
1360.CW W ,
1361.CW U ,
1362and
1363.CW P
1364are written in the same manner, for example,
1365.CW MOVM.DB.W .
1366A
1367.CW .S
1368suffix allows a
1369.CW MOVM
1370instruction to access user
1371.CW R13
1372and
1373.CW R14
1374when in another processor mode.
1375Shifts and rotates in addressing modes are supported by binary operators
1376.CW <<
1377(logical left shift),
1378.CW >>
1379(logical right shift),
1380.CW ->
1381(arithmetic right shift), and
1382.CW @>
1383(rotate right); for example
1384.CW "R7>>R2" or
1385.CW "R2@>2" .
1386The assembler does not support indexing by a shifted expression;
1387only names can be doubly indexed.
1388.PP
1389Any instruction can be followed by a suffix that makes the instruction conditional:
1390.CW .EQ ,
1391.CW .NE ,
1392and so on, as in the ARM manual, with synonyms
1393.CW .HS
1394(for
1395.CW .CS )
1396and
1397.CW .LO
1398(for
1399.CW .CC ),
1400for example
1401.CW ADD.NE .
1402Arithmetic
1403and logical instructions
1404can have a
1405.CW .S
1406suffix, as ARM allows, to set condition codes.
1407.PP
1408The syntax of the
1409.CW MCR
1410and
1411.CW MRC
1412coprocessor instructions is largely as in the manual, with the usual adjustments.
1413The assembler directly supports only the ARM floating-point coprocessor
1414operations used by the compiler:
1415.CW CMP ,
1416.CW ADD ,
1417.CW SUB ,
1418.CW MUL ,
1419and
1420.CW DIV ,
1421all with
1422.CW F
1423or
1424.CW D
1425suffix selecting single or double precision.
1426Floating-point load or store become
1427.CW MOVF
1428and
1429.CW MOVD .
1430Conversion instructions are also specified by moves:
1431.CW MOVWD ,
1432.CW MOVWF ,
1433.CW MOVDW ,
1434.CW MOVWD ,
1435.CW MOVFD ,
1436and
1437.CW MOVDF .
1438.SH
1439AMD 29000
1440.PP
1441For details about this assembly language, which was built for the AMD 29240,
1442look at the sources or examine compiler output.
1443