xref: /plan9/sys/doc/asm.ms (revision f54a2a50c3974e61d1e72e300aebab42add463f1)
1.HTML "A Manual for the Plan 9 assembler
2.ft CW
3.ta 8n +8n +8n +8n +8n +8n +8n
4.ft
5.TL
6A Manual for the Plan 9 assembler
7.AU
8Rob Pike
9rob@plan9.bell-labs.com
10.SH
11Machines
12.PP
13There is an assembler for each of the MIPS, SPARC, Intel 386, AMD64,
14Power PC, and ARM.
15The 68020 assembler,
16.CW 2a ,
17(no longer distributed)
18is the oldest and in many ways the prototype.
19The assemblers are really just variations of a single program:
20they share many properties such as left-to-right assignment order for
21instruction operands and the synthesis of macro instructions
22such as
23.CW MOVE
24to hide the peculiarities of the load and store structure of the machines.
25To keep things concrete, the first part of this manual is
26specifically about the 68020.
27At the end is a description of the differences among
28the other assemblers.
29.PP
30The document, ``How to Use the Plan 9 C Compiler'', by Rob Pike,
31is a prerequisite for this manual.
32.SH
33Registers
34.PP
35All pre-defined symbols in the assembler are upper-case.
36Data registers are
37.CW R0
38through
39.CW R7 ;
40address registers are
41.CW A0
42through
43.CW A7 ;
44floating-point registers are
45.CW F0
46through
47.CW F7 .
48.PP
49A pointer in
50.CW A6
51is used by the C compiler to point to data, enabling short addresses to
52be used more often.
53The value of
54.CW A6
55is constant and must be set during C program initialization
56to the address of the externally-defined symbol
57.CW a6base .
58.PP
59The following hardware registers are defined in the assembler; their
60meaning should be obvious given a 68020 manual:
61.CW CAAR ,
62.CW CACR ,
63.CW CCR ,
64.CW DFC ,
65.CW ISP ,
66.CW MSP ,
67.CW SFC ,
68.CW SR ,
69.CW USP ,
70and
71.CW VBR .
72.PP
73The assembler also defines several pseudo-registers that
74manipulate the stack:
75.CW FP ,
76.CW SP ,
77and
78.CW TOS .
79.CW FP
80is the frame pointer, so
81.CW 0(FP)
82is the first argument,
83.CW 4(FP)
84is the second, and so on.
85.CW SP
86is the local stack pointer, where automatic variables are held
87(SP is a pseudo-register only on the 68020);
88.CW 0(SP)
89is the first automatic, and so on as with
90.CW FP .
91Finally,
92.CW TOS
93is the top-of-stack register, used for pushing parameters to procedures,
94saving temporary values, and so on.
95.PP
96The assembler and loader track these pseudo-registers so
97the above statements are true regardless of what has been
98pushed on the hardware stack, pointed to by
99.CW A7 .
100The name
101.CW A7
102refers to the hardware stack pointer, but beware of mixed use of
103.CW A7
104and the above stack-related pseudo-registers, which will cause trouble.
105Note, too, that the
106.CW PEA
107instruction is observed by the loader to
108alter SP and thus will insert a corresponding pop before all returns.
109The assembler accepts a label-like name to be attached to
110.CW FP
111and
112.CW SP
113uses, such as
114.CW p+0(FP) ,
115to help document that
116.CW p
117is the first argument to a routine.
118The name goes in the symbol table but has no significance to the result
119of the program.
120.SH
121Referring to data
122.PP
123All external references must be made relative to some pseudo-register,
124either
125.CW PC
126(the virtual program counter) or
127.CW SB
128(the ``static base'' register).
129.CW PC
130counts instructions, not bytes of data.
131For example, to branch to the second following instruction, that is,
132to skip one instruction, one may write
133.P1
134	BRA	2(PC)
135.P2
136Labels are also allowed, as in
137.P1
138	BRA	return
139	NOP
140return:
141	RTS
142.P2
143When using labels, there is no
144.CW (PC)
145annotation.
146.PP
147The pseudo-register
148.CW SB
149refers to the beginning of the address space of the program.
150Thus, references to global data and procedures are written as
151offsets to
152.CW SB ,
153as in
154.P1
155	MOVL	$array(SB), TOS
156.P2
157to push the address of a global array on the stack, or
158.P1
159	MOVL	array+4(SB), TOS
160.P2
161to push the second (4-byte) element of the array.
162Note the use of an offset; the complete list of addressing modes is given below.
163Similarly, subroutine calls must use
164.CW SB :
165.P1
166	BSR	exit(SB)
167.P2
168File-static variables have syntax
169.P1
170	local<>+4(SB)
171.P2
172The
173.CW <>
174will be filled in at load time by a unique integer.
175.PP
176When a program starts, it must execute
177.P1
178	MOVL	$a6base(SB), A6
179.P2
180before accessing any global data.
181(On machines such as the MIPS and SPARC that cannot load a register
182in a single instruction, constants are loaded through the static base
183register.  The loader recognizes code that initializes the static
184base register and treats it specially.  You must be careful, however,
185not to load large constants on such machines when the static base
186register is not set up, such as early in interrupt routines.)
187.SH
188Expressions
189.PP
190Expressions are mostly what one might expect.
191Where an offset or a constant is expected,
192a primary expression with unary operators is allowed.
193A general C constant expression is allowed in parentheses.
194.PP
195Source files are preprocessed exactly as in the C compiler, so
196.CW #define
197and
198.CW #include
199work.
200.SH
201Addressing modes
202.PP
203The simple addressing modes are shared by all the assemblers.
204Here, for completeness, follows a table of all the 68020 addressing modes,
205since that machine has the richest set.
206In the table,
207.CW o
208is an offset, which if zero may be elided, and
209.CW d
210is a displacement, which is a constant between -128 and 127 inclusive.
211Many of the modes listed have the same name;
212scrutiny of the format will show what default is being applied.
213For instance, indexed mode with no address register supplied operates
214as though a zero-valued register were used.
215For "offset" read "displacement."
216For "\f(CW.s\fP" read one of
217.CW .L ,
218or
219.CW .W
220followed by
221.CW *1 ,
222.CW *2 ,
223.CW *4 ,
224or
225.CW *8
226to indicate the size and scaling of the data.
227.IP
228.TS
229l lfCW.
230data register	R0
231address register	A0
232floating-point register	F0
233special names	CAAR, CACR, etc.
234constant	$con
235floating point constant	$fcon
236external symbol	name+o(SB)
237local symbol	name<>+o(SB)
238automatic symbol	name+o(SP)
239argument	name+o(FP)
240address of external	$name+o(SB)
241address of local	$name<>+o(SB)
242indirect post-increment	(A0)+
243indirect pre-decrement	-(A0)
244indirect with offset	o(A0)
245indexed with offset	o()(R0.s)
246indexed with offset	o(A0)(R0.s)
247external indexed	name+o(SB)(R0.s)
248local indexed	name<>+o(SB)(R0.s)
249automatic indexed	name+o(SP)(R0.s)
250parameter indexed	name+o(FP)(R0.s)
251offset indirect post-indexed	d(o())(R0.s)
252offset indirect post-indexed	d(o(A0))(R0.s)
253external indirect post-indexed	d(name+o(SB))(R0.s)
254local indirect post-indexed	d(name<>+o(SB))(R0.s)
255automatic indirect post-indexed	d(name+o(SP))(R0.s)
256parameter indirect post-indexed	d(name+o(FP))(R0.s)
257offset indirect pre-indexed	d(o()(R0.s))
258offset indirect pre-indexed	d(o(A0))
259offset indirect pre-indexed	d(o(A0)(R0.s))
260external indirect pre-indexed	d(name+o(SB))
261external indirect pre-indexed	d(name+o(SB)(R0.s))
262local indirect pre-indexed	d(name<>+o(SB))
263local indirect pre-indexed	d(name<>+o(SB)(R0.s))
264automatic indirect pre-indexed	d(name+o(SP))
265automatic indirect pre-indexed	d(name+o(SP)(R0.s))
266parameter indirect pre-indexed	d(name+o(FP))
267parameter indirect pre-indexed	d(name+o(FP)(R0.s))
268.TE
269.in
270.SH
271Laying down data
272.PP
273Placing data in the instruction stream, say for interrupt vectors, is easy:
274the pseudo-instructions
275.CW LONG
276and
277.CW WORD
278(but not
279.CW BYTE )
280lay down the value of their single argument, of the appropriate size,
281as if it were an instruction:
282.P1
283	LONG	$12345
284.P2
285places the long 12345 (base 10)
286in the instruction stream.
287(On most machines,
288the only such operator is
289.CW WORD
290and it lays down 32-bit quantities.
291The 386 has all three:
292.CW LONG ,
293.CW WORD ,
294and
295.CW BYTE .
296The AMD64 adds
297.CW QUAD
298to that for 64-bit values.
299The 960 has only one,
300.CW LONG .)
301.PP
302Placing information in the data section is more painful.
303The pseudo-instruction
304.CW DATA
305does the work, given two arguments: an address at which to place the item,
306including its size,
307and the value to place there.  For example, to define a character array
308.CW array
309containing the characters
310.CW abc
311and a terminating null:
312.P1
313	DATA    array+0(SB)/1, $'a'
314	DATA    array+1(SB)/1, $'b'
315	DATA    array+2(SB)/1, $'c'
316	GLOBL   array(SB), $4
317.P2
318or
319.P1
320	DATA    array+0(SB)/4, $"abc\ez"
321	GLOBL   array(SB), $4
322.P2
323The
324.CW /1
325defines the number of bytes to define,
326.CW GLOBL
327makes the symbol global, and the
328.CW $4
329says how many bytes the symbol occupies.
330Uninitialized data is zeroed automatically.
331The character
332.CW \ez
333is equivalent to the C
334.CW \e0.
335The string in a
336.CW DATA
337statement may contain a maximum of eight bytes;
338build larger strings piecewise.
339Two pseudo-instructions,
340.CW DYNT
341and
342.CW INIT ,
343allow the (obsolete) Alef compilers to build dynamic type information during the load
344phase.
345The
346.CW DYNT
347pseudo-instruction has two forms:
348.P1
349	DYNT	, ALEF_SI_5+0(SB)
350	DYNT	ALEF_AS+0(SB), ALEF_SI_5+0(SB)
351.P2
352In the first form,
353.CW DYNT
354defines the symbol to be a small unique integer constant, chosen by the loader,
355which is some multiple of the word size.  In the second form,
356.CW DYNT
357defines the second symbol in the same way,
358places the address of the most recently
359defined text symbol in the array specified by the first symbol at the
360index defined by the value of the second symbol,
361and then adjusts the size of the array accordingly.
362.PP
363The
364.CW INIT
365pseudo-instruction takes the same parameters as a
366.CW DATA
367statement.  Its symbol is used as the base of an array and the
368data item is installed in the array at the offset specified by the most recent
369.CW DYNT
370pseudo-instruction.
371The size of the array is adjusted accordingly.
372The
373.CW DYNT
374and
375.CW INIT
376pseudo-instructions are not implemented on the 68020.
377.SH
378Defining a procedure
379.PP
380Entry points are defined by the pseudo-operation
381.CW TEXT ,
382which takes as arguments the name of the procedure (including the ubiquitous
383.CW (SB) )
384and the number of bytes of automatic storage to pre-allocate on the stack,
385which will usually be zero when writing assembly language programs.
386On machines with a link register, such as the MIPS and SPARC,
387the special value -4 instructs the loader to generate no PC save
388and restore instructions, even if the function is not a leaf.
389Here is a complete procedure that returns the sum
390of its two arguments:
391.P1
392TEXT	sum(SB), $0
393	MOVL	arg1+0(FP), R0
394	ADDL	arg2+4(FP), R0
395	RTS
396.P2
397An optional middle argument
398to the
399.CW TEXT
400pseudo-op is a bit field of options to the loader.
401Setting the 1 bit suspends profiling the function when profiling is enabled for the rest of
402the program.
403For example,
404.P1
405TEXT	sum(SB), 1, $0
406	MOVL	arg1+0(FP), R0
407	ADDL	arg2+4(FP), R0
408	RTS
409.P2
410will not be profiled; the first version above would be.
411Subroutines with peculiar state, such as system call routines,
412should not be profiled.
413.PP
414Setting the 2 bit allows multiple definitions of the same
415.CW TEXT
416symbol in a program; the loader will place only one such function in the image.
417It was emitted only by the Alef compilers.
418.PP
419Subroutines to be called from C should place their result in
420.CW R0 ,
421even if it is an address.
422Floating point values are returned in
423.CW F0 .
424Functions that return a structure to a C program
425receive as their first argument the address of the location to
426store the result;
427.CW R0
428is unused in the calling protocol for such procedures.
429A subroutine is responsible for saving its own registers,
430and therefore is free to use any registers without saving them (``caller saves'').
431.CW A6
432and
433.CW A7
434are the exceptions as described above.
435.SH
436When in doubt
437.PP
438If you get confused, try using the
439.CW -S
440option to
441.CW 2c
442and compiling a sample program.
443The standard output is valid input to the assembler.
444.SH
445Instructions
446.PP
447The instruction set of the assembler is not identical to that
448of the machine.
449It is chosen to match what the compiler generates, augmented
450slightly by specific needs of the operating system.
451For example,
452.CW 2a
453does not distinguish between the various forms of
454.CW MOVE
455instruction: move quick, move address, etc.  Instead the context
456does the job.  For example,
457.P1
458	MOVL	$1, R1
459	MOVL	A0, R2
460	MOVW	SR, R3
461.P2
462generates official
463.CW MOVEQ ,
464.CW MOVEA ,
465and
466.CW MOVESR
467instructions.
468A number of instructions do not have the syntax necessary to specify
469their entire capabilities.  Notable examples are the bitfield
470instructions, the
471multiply and divide instructions, etc.
472For a complete set of generated instruction names (in
473.CW 2a
474notation, not Motorola's) see the file
475.CW /sys/src/cmd/2c/2.out.h .
476Despite its name, this file contains an enumeration of the
477instructions that appear in the intermediate files generated
478by the compiler, which correspond exactly to lines of assembly language.
479.SH
480Laying down instructions
481.PP
482The loader modifies the code produced by the assembler and compiler.
483It folds branches,
484copies short sequences of code to eliminate branches,
485and discards unreachable code.
486The first instruction of every function is assumed to be reachable.
487The pseudo-instruction
488.CW NOP ,
489which you may see in compiler output,
490means no instruction at all, rather than an instruction that does nothing.
491The loader discards all
492.CW NOP 's.
493.PP
494To generate a true
495.CW NOP
496instruction, or any other instruction not known to the assembler, use a
497.CW WORD
498pseudo-instruction.
499Such instructions on RISCs are not scheduled by the loader and must have
500their delay slots filled manually.
501.SH
502MIPS
503.PP
504The registers are only addressed by number:
505.CW R0
506through
507.CW R31 .
508.CW R29
509is the stack pointer;
510.CW R30
511is used as the static base pointer, the analogue of
512.CW A6
513on the 68020.
514Its value is the address of the global symbol
515.CW setR30(SB) .
516The register holding returned values from subroutines is
517.CW R1 .
518When a function is called, space for the first argument
519is reserved at
520.CW 0(FP)
521but in C (not Alef) the value is passed in
522.CW R1
523instead.
524.PP
525The loader uses
526.CW R28
527as a temporary.  The system uses
528.CW R26
529and
530.CW R27
531as interrupt-time temporaries.  Therefore none of these registers
532should be used in user code.
533.PP
534The control registers are not known to the assembler.
535Instead they are numbered registers
536.CW M0 ,
537.CW M1 ,
538etc.
539Use this trick to access, say,
540.CW STATUS :
541.P1
542#define	STATUS	12
543	MOVW	M(STATUS), R1
544.P2
545.PP
546Floating point registers are called
547.CW F0
548through
549.CW F31 .
550By convention,
551.CW F24
552must be initialized to the value 0.0,
553.CW F26
554to 0.5,
555.CW F28
556to 1.0, and
557.CW F30
558to 2.0;
559this is done by the operating system.
560.PP
561The instructions and their syntax are different from those of the manufacturer's
562manual.
563There are no
564.CW lui
565and kin; instead there are
566.CW MOVW
567(move word),
568.CW MOVH
569(move halfword),
570and
571.CW MOVB
572(move byte) pseudo-instructions.  If the operand is unsigned, the instructions
573are
574.CW MOVHU
575and
576.CW MOVBU .
577The order of operands is from left to right in dataflow order, just as
578on the 68020 but not as in MIPS documentation.
579This means that the
580.CW Bcond
581instructions are reversed with respect to the book; for example, a
582.CW va
583.CW BGTZ
584generates a MIPS
585.CW bltz
586instruction.
587.PP
588The assembler is for the R2000, R3000, and most of the R4000 and R6000 architectures.
589It understands the 64-bit instructions
590.CW MOVV ,
591.CW MOVVL ,
592.CW ADDV ,
593.CW ADDVU ,
594.CW SUBV ,
595.CW SUBVU ,
596.CW MULV ,
597.CW MULVU ,
598.CW DIVV ,
599.CW DIVVU ,
600.CW SLLV ,
601.CW SRLV ,
602and
603.CW SRAV .
604The assembler does not have any cache, load-linked, or store-conditional instructions.
605.PP
606Some assembler instructions are expanded into multiple instructions by the loader.
607For example the loader may convert the load of a 32 bit constant into an
608.CW lui
609followed by an
610.CW ori .
611.PP
612Assembler instructions should be laid out as if there
613were no load, branch, or floating point compare delay slots;
614the loader will rearrange\(em\f2schedule\f1\(emthe instructions
615to guarantee correctness and improve performance.
616The only exception is that the correct scheduling of instructions
617that use control registers varies from model to model of machine
618(and is often undocumented) so you should schedule such instructions
619by hand to guarantee correct behavior.
620The loader generates
621.P1
622	NOR	R0, R0, R0
623.P2
624when it needs a true no-op instruction.
625Use exactly this instruction when scheduling code manually;
626the loader recognizes it and schedules the code before it and after it independently.  Also,
627.CW WORD
628pseudo-ops are scheduled like no-ops.
629.PP
630The
631.CW NOSCHED
632pseudo-op disables instruction scheduling
633(scheduling is enabled by default);
634.CW SCHED
635re-enables it.
636Branch folding, code copying, and dead code elimination are
637disabled for instructions that are not scheduled.
638.SH
639SPARC
640.PP
641Once you understand the Plan 9 model for the MIPS, the SPARC is familiar.
642Registers have numerical names only:
643.CW R0
644through
645.CW R31 .
646Forget about register windows: Plan 9 doesn't use them at all.
647The machine has 32 global registers, period.
648.CW R1
649[sic] is the stack pointer.
650.CW R2
651is the static base register, with value the address of
652.CW setSB(SB) .
653.CW R7
654is the return register and also the register holding the first
655argument to a C (not Alef) function, again with space reserved at
656.CW 0(FP) .
657.CW R14
658is the loader temporary.
659.PP
660Floating-point registers are exactly as on the MIPS.
661.PP
662The control registers are known by names such as
663.CW FSR .
664The instructions to access these registers are
665.CW MOVW
666instructions, for example
667.P1
668	MOVW	Y, R8
669.P2
670for the SPARC instruction
671.P1
672	rdy	%r8
673.P2
674.PP
675Move instructions are similar to those on the MIPS: pseudo-operations
676that turn into appropriate sequences of
677.CW sethi
678instructions, adds, etc.
679Instructions read from left to right.  Because the arguments are
680flipped to
681.CW SUBCC ,
682the condition codes are not inverted as on the MIPS.
683.PP
684The syntax for the ASI stuff is, for example to move a word from ASI 2:
685.P1
686	MOVW	(R7, 2), R8
687.P2
688The syntax for double indexing is
689.P1
690	MOVW	(R7+R8), R9
691.P2
692.PP
693The SPARC's instruction scheduling is similar to the MIPS's.
694The official no-op instruction is:
695.P1
696	ORN	R0, R0, R0
697.P2
698.SH
699i960
700.PP
701Registers are numbered
702.CW R0
703through
704.CW R31 .
705Stack pointer is
706.CW R29 ;
707return register is
708.CW R4 ;
709static base is
710.CW R28 ;
711it is initialized to the address of
712.CW setSB(SB) .
713.CW R3
714must be zero; this should be done manually early in execution by
715.P1
716	SUBO	R3, R3
717.P2
718.CW R27
719is the loader temporary.
720.PP
721There is no support for floating point.
722.PP
723The Intel calling convention is not supported and cannot be used; use
724.CW BAL
725instead.
726Instructions are mostly as in the book.  The major change is that
727.CW LOAD
728and
729.CW STORE
730are both called
731.CW MOV .
732The extension character for
733.CW MOV
734is as in the manual:
735.CW O
736for ordinal,
737.CW W
738for signed, etc.
739.SH
740i386
741.PP
742The assembler assumes 32-bit protected mode.
743The register names are
744.CW SP ,
745.CW AX ,
746.CW BX ,
747.CW CX ,
748.CW DX ,
749.CW BP ,
750.CW DI ,
751and
752.CW SI .
753The stack pointer (not a pseudo-register) is
754.CW SP
755and the return register is
756.CW AX .
757There is no physical frame pointer but, as for the MIPS,
758.CW FP
759is a pseudo-register that acts as
760a frame pointer.
761.PP
762Opcode names are mostly the same as those listed in the Intel manual
763with an
764.CW L ,
765.CW W ,
766or
767.CW B
768appended to identify 32-bit,
76916-bit, and 8-bit operations.
770The exceptions are loads, stores, and conditionals.
771All load and store opcodes to and from general registers, special registers
772(such as
773.CW CR0,
774.CW CR3,
775.CW GDTR,
776.CW IDTR,
777.CW SS,
778.CW CS,
779.CW DS,
780.CW ES,
781.CW FS,
782and
783.CW GS )
784or memory are written
785as
786.P1
787	MOV\f2x\fP	src,dst
788.P2
789where
790.I x
791is
792.CW L ,
793.CW W ,
794or
795.CW B .
796Thus to get
797.CW AL
798use a
799.CW MOVB
800instruction.  If you need to access
801.CW AH ,
802you must mention it explicitly in a
803.CW MOVB :
804.P1
805	MOVB	AH, BX
806.P2
807There are many examples of illegal moves, for example,
808.P1
809	MOVB	BP, DI
810.P2
811that the loader actually implements as pseudo-operations.
812.PP
813The names of conditions in all conditional instructions
814.CW J , (
815.CW SET )
816follow the conventions of the 68020 instead of those of the Intel
817assembler:
818.CW JOS ,
819.CW JOC ,
820.CW JCS ,
821.CW JCC ,
822.CW JEQ ,
823.CW JNE ,
824.CW JLS ,
825.CW JHI ,
826.CW JMI ,
827.CW JPL ,
828.CW JPS ,
829.CW JPC ,
830.CW JLT ,
831.CW JGE ,
832.CW JLE ,
833and
834.CW JGT
835instead of
836.CW JO ,
837.CW JNO ,
838.CW JB ,
839.CW JNB ,
840.CW JZ ,
841.CW JNZ ,
842.CW JBE ,
843.CW JNBE ,
844.CW JS ,
845.CW JNS ,
846.CW JP ,
847.CW JNP ,
848.CW JL ,
849.CW JNL ,
850.CW JLE ,
851and
852.CW JNLE .
853.PP
854The addressing modes have syntax like
855.CW AX ,
856.CW (AX) ,
857.CW (AX)(BX*4) ,
858.CW 10(AX) ,
859and
860.CW 10(AX)(BX*4) .
861The offsets from
862.CW AX
863can be replaced by offsets from
864.CW FP
865or
866.CW SB
867to access names, for example
868.CW extern+5(SB)(AX*2) .
869.PP
870Other notes: Non-relative
871.CW JMP
872and
873.CW CALL
874have a
875.CW *
876added to the syntax.
877Only
878.CW LOOP ,
879.CW LOOPEQ ,
880and
881.CW LOOPNE
882are legal loop instructions.  Only
883.CW REP
884and
885.CW REPN
886are recognized repeaters.  These are not prefixes, but rather
887stand-alone opcodes that precede the strings, for example
888.P1
889	CLD; REP; MOVSL
890.P2
891Segment override prefixes in
892.CW MOD/RM
893fields are not supported.
894.SH
895AMD64
896.PP
897The assembler assumes 64-bit mode unless a
898.CW MODE
899pseudo-operation is given:
900.P1
901	MODE $32
902.P2
903to change to 32-bit mode.
904The effect is mainly to diagnose instructions that are illegal in
905the given mode, but the loader will also assume 32-bit operands and addresses,
906and 32-bit PC values for call and return.
907The assembler's conventions are similar to those for the 386, above.
908The architecture provides extra fixed-point registers
909.CW R8
910to
911.CW R15 .
912All registers are 64 bit, but instructions access low-order 8, 16 and 32 bits
913as described in the processor handbook.
914For example,
915.CW MOVL
916to
917.CW AX
918puts a value in the low-order 32 bits and clears the top 32 bits to zero.
919Literal operands are limited to signed 32 bit values, which are sign-extended
920to 64 bits in 64 bit operations; the exception is
921.CW MOVQ ,
922which allows 64-bit literals.
923The external registers in Plan 9's C are allocated from
924.CW R15
925down.
926.PP
927There are many new instructions, including the MMX and XMM media instructions,
928and conditional move instructions.
929MMX registers are
930.CW M0
931to
932.CW M7 ,
933and
934XMM registers are
935.CW X0
936to
937.CW X15 .
938As with the 386 instruction names,
939all new 64-bit integer instructions, and the MMX and XMM instructions
940uniformly use
941.CW L
942for `long word' (32 bits) and
943.CW Q
944for `quad word' (64 bits).
945Some instructions use
946.CW O
947(`octword') for 128-bit values, where the processor handbook
948variously uses
949.CW O
950or
951.CW DQ .
952The assembler also consistently uses
953.CW PL
954for `packed long' in
955XMM instructions, instead of
956.CW Q ,
957.CW DQ
958or
959.CW PI .
960Either
961.CW MOVL
962or
963.CW MOVQ
964can be used to move values to and from control registers, even when
965the registers might be 64 bits.
966The assembler often accepts the handbook's name to ease conversion
967of existing code (but remember that the operand order is uniformly
968source then destination).
969.PP
970C's
971.CW long
972.CW long
973type is 64 bits, but passed and returned by value, not by reference.
974More notably, C pointer values are 64 bits, and thus
975.CW long
976.CW long
977and
978.CW unsigned
979.CW long
980.CW long
981are the only integer types wide enough to hold a pointer value.
982The C compiler and library use the XMM floating-point instructions, not
983the old 387 ones, although the latter are implemented by assembler and loader.
984Unlike the 386, the first integer or pointer argument is passed in a register, which is
985.CW BP
986for an integer or pointer (it can be referred to in assembly code by the pseudonym
987.CW RARG ).
988.CW AX
989holds the return value from subroutines as before.
990Floating-point results are returned in
991.CW X0 ,
992although currently the first floating-point parameter is not passed in a register.
993All parameters less than 8 bytes in length have 8 byte slots reserved on the stack
994to preserve alignment and simplify variable-length argument list access,
995including the first parameter when passed in a register,
996even though bytes 4 to 7 are not initialized.
997.
998.SH
999Power PC
1000.PP
1001The Power PC follows the Plan 9 model set by the MIPS and SPARC,
1002not the elaborate ABIs.
1003The 32-bit instructions of the 60x and 8xx PowerPC architectures are supported;
1004there is no support for the older POWER instructions.
1005Registers are
1006.CW R0
1007through
1008.CW R31 .
1009.CW R0
1010is initialized to zero; this is done by C start up code
1011and assumed by the compiler and loader.
1012.CW R1
1013is the stack pointer.
1014.CW R2
1015is the static base register, with value the address of
1016.CW setSB(SB) .
1017.CW R3
1018is the return register and also the register holding the first
1019argument to a C function, with space reserved at
1020.CW 0(FP)
1021as on the MIPS.
1022.CW R31
1023is the loader temporary.
1024The external registers in Plan 9's C are allocated from
1025.CW R30
1026down.
1027.PP
1028Floating point registers are called
1029.CW F0
1030through
1031.CW F31 .
1032By convention, several registers are initialized
1033to specific values; this is done by the operating system.
1034.CW F27
1035must be initialized to the value
1036.CW 0x4330000080000000
1037(used by float-to-int conversion),
1038.CW F28
1039to the value 0.0,
1040.CW F29
1041to 0.5,
1042.CW F30
1043to 1.0, and
1044.CW F31
1045to 2.0.
1046.PP
1047As on the MIPS and SPARC, the assembler accepts arbitrary literals
1048as operands to
1049.CW MOVW ,
1050and also to
1051.CW ADD
1052and others where `immediate' variants exist,
1053and the loader generates sequences
1054of
1055.CW addi ,
1056.CW addis ,
1057.CW oris ,
1058etc. as required.
1059The register indirect addressing modes use the same syntax as the SPARC,
1060including double indexing when allowed.
1061.PP
1062The instruction names are generally derived from the Motorola ones,
1063subject to slight transformation:
1064the
1065.CW . ' `
1066marking the setting of condition codes is replaced by
1067.CW CC ,
1068and when the letter
1069.CW o ' `
1070represents `OE=1' it is replaced by
1071.CW V .
1072Thus
1073.CW add ,
1074.CW addo.
1075and
1076.CW subfzeo.
1077become
1078.CW ADD ,
1079.CW ADDVCC
1080and
1081.CW SUBFZEVCC .
1082As well as the three-operand conditional branch instruction
1083.CW BC ,
1084the assembler provides pseudo-instructions for the common cases:
1085.CW BEQ ,
1086.CW BNE ,
1087.CW BGT ,
1088.CW BGE ,
1089.CW BLT ,
1090.CW BLE ,
1091.CW BVC ,
1092and
1093.CW BVS .
1094The unconditional branch instruction is
1095.CW BR .
1096Indirect branches use
1097.CW "(CTR)"
1098or
1099.CW "(LR)"
1100as target.
1101.PP
1102Load or store operations are replaced by
1103.CW MOV
1104variants in the usual way:
1105.CW MOVW
1106(move word),
1107.CW MOVH
1108(move halfword with sign extension), and
1109.CW MOVB
1110(move byte with sign extension, a pseudo-instruction),
1111with unsigned variants
1112.CW MOVHZ
1113and
1114.CW MOVBZ ,
1115and byte-reversing
1116.CW MOVWBR
1117and
1118.CW MOVHBR .
1119`Load or store with update' versions are
1120.CW MOVWU ,
1121.CW MOVHU ,
1122and
1123.CW MOVBZU .
1124Load or store multiple is
1125.CW MOVMW .
1126The exceptions are the string instructions, which are
1127.CW LSW
1128and
1129.CW STSW ,
1130and the reservation instructions
1131.CW lwarx
1132and
1133.CW stwcx. ,
1134which are
1135.CW LWAR
1136and
1137.CW STWCCC ,
1138all with operands in the usual data-flow order.
1139Floating-point load or store instructions are
1140.CW FMOVD ,
1141.CW FMOVDU ,
1142.CW FMOVS ,
1143and
1144.CW FMOVSU .
1145The register to register move instructions
1146.CW fmr
1147and
1148.CW fmr.
1149are written
1150.CW FMOVD
1151and
1152.CW FMOVDCC .
1153.PP
1154The assembler knows the commonly used special purpose registers:
1155.CW CR ,
1156.CW CTR ,
1157.CW DEC ,
1158.CW LR ,
1159.CW MSR ,
1160and
1161.CW XER .
1162The rest, which are often architecture-dependent, are referenced as
1163.CW SPR(n) .
1164The segment registers of the 60x series are similarly
1165.CW SEG(n) ,
1166but
1167.I n
1168can also be a register name, as in
1169.CW SEG(R3) .
1170Moves between special purpose registers and general purpose ones,
1171when allowed by the architecture,
1172are written as
1173.CW MOVW ,
1174replacing
1175.CW mfcr ,
1176.CW mtcr ,
1177.CW mfmsr ,
1178.CW mtmsr ,
1179.CW mtspr ,
1180.CW mfspr ,
1181.CW mftb ,
1182and many others.
1183.PP
1184The fields of the condition register
1185.CW CR
1186are referenced as
1187.CW CR(0)
1188through
1189.CW CR(7) .
1190They are used by the
1191.CW MOVFL
1192(move field) pseudo-instruction,
1193which produces
1194.CW mcrf
1195or
1196.CW mtcrf .
1197For example:
1198.P1
1199	MOVFL	CR(3), CR(0)
1200	MOVFL	R3, CR(1)
1201	MOVFL	R3, $7, CR
1202.P2
1203They are also accepted in
1204the conditional branch instruction, for example
1205.P1
1206	BEQ	CR(7), label
1207.P2
1208Fields of the
1209.CW FPSCR
1210are accessed using
1211.CW MOVFL
1212in a similar way:
1213.P1
1214	MOVFL	FPSCR, F0
1215	MOVFL	F0, FPSCR
1216	MOVFL	F0, $7, FPSCR
1217	MOVFL	$0, FPSCR(3)
1218.P2
1219producing
1220.CW mffs ,
1221.CW mtfsf
1222or
1223.CW mtfsfi ,
1224as appropriate.
1225.SH
1226ARM
1227.PP
1228The assembler provides access to
1229.CW R0
1230through
1231.CW R14
1232and the
1233.CW PC .
1234The stack pointer is
1235.CW R13 ,
1236the link register is
1237.CW R14 ,
1238and the static base register is
1239.CW R12 .
1240.CW R0
1241is the return register and also the register holding
1242the first argument to a subroutine.
1243The external registers in Plan 9's C are allocated from
1244.CW R10
1245down.
1246.CW R11
1247is used by the loader as a temporary register.
1248The assembler supports the
1249.CW CPSR
1250and
1251.CW SPSR
1252registers.
1253It also knows about coprocessor registers
1254.CW C0
1255through
1256.CW C15 .
1257Floating registers are
1258.CW F0
1259through
1260.CW F7 ,
1261.CW FPSR
1262and
1263.CW FPCR .
1264.PP
1265As with the other architectures, loads and stores are called
1266.CW MOV ,
1267e.g.
1268.CW MOVW
1269for load word or store word, and
1270.CW MOVM
1271for
1272load or store multiple,
1273depending on the operands.
1274.PP
1275Addressing modes are supported by suffixes to the instructions:
1276.CW .IA
1277(increment after),
1278.CW .IB
1279(increment before),
1280.CW .DA
1281(decrement after), and
1282.CW .DB
1283(decrement before).
1284These can only be used with the
1285.CW MOV
1286instructions.
1287The move multiple instruction,
1288.CW MOVM ,
1289defines a range of registers using brackets, e.g.
1290.CW [R0-R12] .
1291The special
1292.CW MOVM
1293addressing mode bits
1294.CW W ,
1295.CW U ,
1296and
1297.CW P
1298are written in the same manner, for example,
1299.CW MOVM.DB.W .
1300A
1301.CW .S
1302suffix allows a
1303.CW MOVM
1304instruction to access user
1305.CW R13
1306and
1307.CW R14
1308when in another processor mode.
1309Shifts and rotates in addressing modes are supported by binary operators
1310.CW <<
1311(logical left shift),
1312.CW >>
1313(logical right shift),
1314.CW ->
1315(arithmetic right shift), and
1316.CW @>
1317(rotate right); for example
1318.CW "R7>>R2" or
1319.CW "R2@>2" .
1320The assembler does not support indexing by a shifted expression;
1321only names can be doubly indexed.
1322.PP
1323Any instruction can be followed by a suffix that makes the instruction conditional:
1324.CW .EQ ,
1325.CW .NE ,
1326and so on, as in the ARM manual, with synonyms
1327.CW .HS
1328(for
1329.CW .CS )
1330and
1331.CW .LO
1332(for
1333.CW .CC ),
1334for example
1335.CW ADD.NE .
1336Arithmetic
1337and logical instructions
1338can have a
1339.CW .S
1340suffix, as ARM allows, to set condition codes.
1341.PP
1342The syntax of the
1343.CW MCR
1344and
1345.CW MRC
1346coprocessor instructions is largely as in the manual, with the usual adjustments.
1347The assembler directly supports only the ARM floating-point coprocessor
1348operations used by the compiler:
1349.CW CMP ,
1350.CW ADD ,
1351.CW SUB ,
1352.CW MUL ,
1353and
1354.CW DIV ,
1355all with
1356.CW F
1357or
1358.CW D
1359suffix selecting single or double precision.
1360Floating-point load or store become
1361.CW MOVF
1362and
1363.CW MOVD .
1364Conversion instructions are also specified by moves:
1365.CW MOVWD ,
1366.CW MOVWF ,
1367.CW MOVDW ,
1368.CW MOVWD ,
1369.CW MOVFD ,
1370and
1371.CW MOVDF .
1372