xref: /inferno-os/doc/asm.ms (revision 46439007cf417cbd9ac8049bb4122c890097a0fa)
1.ft CW
2.ta 8n +8n +8n +8n +8n +8n +8n
3.ft
4.TL
5A Manual for the Plan 9 assembler
6.AU
7.I "Rob Pike"
8.AI
9rob@plan9.bell-labs.com
10.SH
11Machines
12.PP
13There is an assembler for each of the MIPS, SPARC, Intel 386,
14Motorola 68020 and 68000, IBM Power PC, DEC Alpha, and ARM.
15The 68020 assembler,
16.CW 2a ,
17is the oldest and in many ways the prototype.
18The assemblers are really just variations of a single program:
19they share many properties such as left-to-right assignment order for
20instruction operands and the synthesis of macro instructions
21such as
22.CW MOVE
23to hide the peculiarities of the load and store structure of the machines.
24To keep things concrete, the first part of this manual is
25specifically about the 68020.
26At the end is a description of the differences among
27the other assemblers.
28.ig
29.PP
30The document, ``How to Use the Plan 9 C Compiler'', by Rob Pike,
31is a prerequisite for this manual.
32..
33.SH
34Registers
35.PP
36All pre-defined symbols in the assembler are upper-case.
37Data registers are
38.CW R0
39through
40.CW R7 ;
41address registers are
42.CW A0
43through
44.CW A7 ;
45floating-point registers are
46.CW F0
47through
48.CW F7 .
49.PP
50A pointer in
51.CW A6
52is used by the C compiler to point to data, enabling short addresses to
53be used more often.
54The value of
55.CW A6
56is constant and must be set during C program initialization
57to the address of the externally-defined symbol
58.CW a6base .
59.PP
60The following hardware registers are defined in the assembler; their
61meaning should be obvious given a 68020 manual:
62.CW CAAR ,
63.CW CACR ,
64.CW CCR ,
65.CW DFC ,
66.CW ISP ,
67.CW MSP ,
68.CW SFC ,
69.CW SR ,
70.CW USP ,
71and
72.CW VBR .
73.PP
74The assembler also defines several pseudo-registers that
75manipulate the stack:
76.CW FP ,
77.CW SP ,
78and
79.CW TOS .
80.CW FP
81is the frame pointer, so
82.CW 0(FP)
83is the first argument,
84.CW 4(FP)
85is the second, and so on.
86.CW SP
87is the local stack pointer, where automatic variables are held
88(SP is a pseudo-register only on the 68020);
89.CW 0(SP)
90is the first automatic, and so on as with
91.CW FP .
92Finally,
93.CW TOS
94is the top-of-stack register, used for pushing parameters to procedures,
95saving temporary values, and so on.
96.PP
97The assembler and loader track these pseudo-registers so
98the above statements are true regardless of what has been
99pushed on the hardware stack, pointed to by
100.CW A7 .
101The name
102.CW A7
103refers to the hardware stack pointer, but beware of mixed use of
104.CW A7
105and the above stack-related pseudo-registers, which will cause trouble.
106Note, too, that the
107.CW PEA
108instruction is observed by the loader to
109alter SP and thus will insert a corresponding pop before all returns.
110The assembler accepts a label-like name to be attached to
111.CW FP
112and
113.CW SP
114uses, such as
115.CW p+0(FP) ,
116to help document that
117.CW p
118is the first argument to a routine.
119The name goes in the symbol table but has no significance to the result
120of the program.
121.SH
122Referring to data
123.PP
124All external references must be made relative to some pseudo-register,
125either
126.CW PC
127(the virtual program counter) or
128.CW SB
129(the ``static base'' register).
130.CW PC
131counts instructions, not bytes of data.
132For example, to branch to the second following instruction, that is,
133to skip one instruction, one may write
134.P1
135	BRA	2(PC)
136.P2
137Labels are also allowed, as in
138.P1
139	BRA	return
140	NOP
141return:
142	RTS
143.P2
144When using labels, there is no
145.CW (PC)
146annotation.
147.PP
148The pseudo-register
149.CW SB
150refers to the beginning of the address space of the program.
151Thus, references to global data and procedures are written as
152offsets to
153.CW SB ,
154as in
155.P1
156	MOVL	$array(SB), TOS
157.P2
158to push the address of a global array on the stack, or
159.P1
160	MOVL	array+4(SB), TOS
161.P2
162to push the second (4-byte) element of the array.
163Note the use of an offset; the complete list of addressing modes is given below.
164Similarly, subroutine calls must use
165.CW SB :
166.P1
167	BSR	exit(SB)
168.P2
169File-static variables have syntax
170.P1
171	local<>+4(SB)
172.P2
173The
174.CW <>
175will be filled in at load time by a unique integer.
176.PP
177When a program starts, it must execute
178.P1
179	MOVL	$a6base(SB), A6
180.P2
181before accessing any global data.
182(On machines such as the MIPS and SPARC that cannot load a register
183in a single instruction, constants are loaded through the static base
184register.  The loader recognizes code that initializes the static
185base register and treats it specially.  You must be careful, however,
186not to load large constants on such machines when the static base
187register is not set up, such as early in interrupt routines.)
188.SH
189Expressions
190.PP
191Expressions are mostly what one might expect.
192Where an offset or a constant is expected,
193a primary expression with unary operators is allowed.
194A general C constant expression is allowed in parentheses.
195.PP
196Source files are preprocessed exactly as in the C compiler, so
197.CW #define
198and
199.CW #include
200work.
201.SH
202Addressing modes
203.PP
204The simple addressing modes are shared by all the assemblers.
205Here, for completeness, follows a table of all the 68020 addressing modes,
206since that machine has the richest set.
207In the table,
208.CW o
209is an offset, which if zero may be elided, and
210.CW d
211is a displacement, which is a constant between -128 and 127 inclusive.
212Many of the modes listed have the same name;
213scrutiny of the format will show what default is being applied.
214For instance, indexed mode with no address register supplied operates
215as though a zero-valued register were used.
216For "offset" read "displacement."
217For "\f(CW.s\fP" read one of
218.CW .L ,
219or
220.CW .W
221followed by
222.CW *1 ,
223.CW *2 ,
224.CW *4 ,
225or
226.CW *8
227to indicate the size and scaling of the data.
228.IP
229.TS
230l lfCW.
231data register	R0
232address register	A0
233floating-point register	F0
234special names	CAAR, CACR, etc.
235constant	$con
236floating point constant	$fcon
237external symbol	name+o(SB)
238local symbol	name<>+o(SB)
239automatic symbol	name+o(SP)
240argument	name+o(FP)
241address of external	$name+o(SB)
242address of local	$name<>+o(SB)
243indirect post-increment	(A0)+
244indirect pre-decrement	-(A0)
245indirect with offset	o(A0)
246indexed with offset	o()(R0.s)
247indexed with offset	o(A0)(R0.s)
248external indexed	name+o(SB)(R0.s)
249local indexed	name<>+o(SB)(R0.s)
250automatic indexed	name+o(SP)(R0.s)
251parameter indexed	name+o(FP)(R0.s)
252offset indirect post-indexed	d(o())(R0.s)
253offset indirect post-indexed	d(o(A0))(R0.s)
254external indirect post-indexed	d(name+o(SB))(R0.s)
255local indirect post-indexed	d(name<>+o(SB))(R0.s)
256automatic indirect post-indexed	d(name+o(SP))(R0.s)
257parameter indirect post-indexed	d(name+o(FP))(R0.s)
258offset indirect pre-indexed	d(o()(R0.s))
259offset indirect pre-indexed	d(o(A0))
260offset indirect pre-indexed	d(o(A0)(R0.s))
261external indirect pre-indexed	d(name+o(SB))
262external indirect pre-indexed	d(name+o(SB)(R0.s))
263local indirect pre-indexed	d(name<>+o(SB))
264local indirect pre-indexed	d(name<>+o(SB)(R0.s))
265automatic indirect pre-indexed	d(name+o(SP))
266automatic indirect pre-indexed	d(name+o(SP)(R0.s))
267parameter indirect pre-indexed	d(name+o(FP))
268parameter indirect pre-indexed	d(name+o(FP)(R0.s))
269.TE
270.in
271.SH
272Laying down data
273.PP
274Placing data in the instruction stream, say for interrupt vectors, is easy:
275the pseudo-instructions
276.CW LONG
277and
278.CW WORD
279(but not
280.CW BYTE )
281lay down the value of their single argument, of the appropriate size,
282as if it were an instruction:
283.P1
284	LONG	$12345
285.P2
286places the long 12345 (base 10)
287in the instruction stream.
288(On most machines,
289the only such operator is
290.CW WORD
291and it lays down 32-bit quantities.
292The 386 has all three:
293.CW LONG ,
294.CW WORD ,
295and
296.CW BYTE .
297The AMD64 adds
298.CW QUAD
299for 64-bit values.)
300.PP
301Placing information in the data section is more painful.
302The pseudo-instruction
303.CW DATA
304does the work, given two arguments: an address at which to place the item,
305including its size,
306and the value to place there.  For example, to define a character array
307.CW array
308containing the characters
309.CW abc
310and a terminating null:
311.P1
312	DATA    array+0(SB)/1, $'a'
313	DATA    array+1(SB)/1, $'b'
314	DATA    array+2(SB)/1, $'c'
315	GLOBL   array(SB), $4
316.P2
317or
318.P1
319	DATA    array+0(SB)/4, $"abc\ez"
320	GLOBL   array(SB), $4
321.P2
322The
323.CW /1
324defines the number of bytes to define,
325.CW GLOBL
326makes the symbol global, and the
327.CW $4
328says how many bytes the symbol occupies.
329Uninitialized data is zeroed automatically.
330The character
331.CW \ez
332is equivalent to the C
333.CW \e0.
334The string in a
335.CW DATA
336statement may contain a maximum of eight bytes;
337build larger strings piecewise.
338Two pseudo-instructions,
339.CW DYNT
340and
341.CW INIT ,
342allow the (obsolete) Alef compilers to build dynamic type information during the load
343phase.
344The
345.CW DYNT
346pseudo-instruction has two forms:
347.P1
348	DYNT	, ALEF_SI_5+0(SB)
349	DYNT	ALEF_AS+0(SB), ALEF_SI_5+0(SB)
350.P2
351In the first form,
352.CW DYNT
353defines the symbol to be a small unique integer constant, chosen by the loader,
354which is some multiple of the word size.  In the second form,
355.CW DYNT
356defines the second symbol in the same way,
357places the address of the most recently
358defined text symbol in the array specified by the first symbol at the
359index defined by the value of the second symbol,
360and then adjusts the size of the array accordingly.
361.PP
362The
363.CW INIT
364pseudo-instruction takes the same parameters as a
365.CW DATA
366statement.  Its symbol is used as the base of an array and the
367data item is installed in the array at the offset specified by the most recent
368.CW DYNT
369pseudo-instruction.
370The size of the array is adjusted accordingly.
371The
372.CW DYNT
373and
374.CW INIT
375pseudo-instructions are not implemented on the 68020.
376.SH
377Defining a procedure
378.PP
379Entry points are defined by the pseudo-operation
380.CW TEXT ,
381which takes as arguments the name of the procedure (including the ubiquitous
382.CW (SB) )
383and the number of bytes of automatic storage to pre-allocate on the stack,
384which will usually be zero when writing assembly language programs.
385On machines with a link register, such as the MIPS and SPARC,
386the special value -4 instructs the loader to generate no PC save
387and restore instructions, even if the function is not a leaf.
388Here is a complete procedure that returns the sum
389of its two arguments:
390.P1
391TEXT	sum(SB), $0
392	MOVL	arg1+0(FP), R0
393	ADDL	arg2+4(FP), R0
394	RTS
395.P2
396An optional middle argument
397to the
398.CW TEXT
399pseudo-op is a bit field of options to the loader.
400Setting the 1 bit suspends profiling the function when profiling is enabled for the rest of
401the program.
402For example,
403.P1
404TEXT	sum(SB), 1, $0
405	MOVL	arg1+0(FP), R0
406	ADDL	arg2+4(FP), R0
407	RTS
408.P2
409will not be profiled; the first version above would be.
410Subroutines with peculiar state, such as system call routines,
411should not be profiled.
412.PP
413Setting the 2 bit allows multiple definitions of the same
414.CW TEXT
415symbol in a program; the loader will place only one such function in the image.
416It was emitted only by the Alef compilers.
417.PP
418Subroutines to be called from C should place their result in
419.CW R0 ,
420even if it is an address.
421Floating point values are returned in
422.CW F0 .
423Functions that return a structure to a C program
424receive as their first argument the address of the location to
425store the result;
426.CW R0
427is unused in the calling protocol for such procedures.
428A subroutine is responsible for saving its own registers,
429and therefore is free to use any registers without saving them (``caller saves'').
430.CW A6
431and
432.CW A7
433are the exceptions as described above.
434.SH
435When in doubt
436.PP
437If you get confused, try using the
438.CW -S
439option to
440.CW 2c
441and compiling a sample program.
442The standard output is valid input to the assembler.
443.SH
444Instructions
445.PP
446The instruction set of the assembler is not identical to that
447of the machine.
448It is chosen to match what the compiler generates, augmented
449slightly by specific needs of the operating system.
450For example,
451.CW 2a
452does not distinguish between the various forms of
453.CW MOVE
454instruction: move quick, move address, etc.  Instead the context
455does the job.  For example,
456.P1
457	MOVL	$1, R1
458	MOVL	A0, R2
459	MOVW	SR, R3
460.P2
461generates official
462.CW MOVEQ ,
463.CW MOVEA ,
464and
465.CW MOVESR
466instructions.
467A number of instructions do not have the syntax necessary to specify
468their entire capabilities.  Notable examples are the bitfield
469instructions, the
470multiply and divide instructions, etc.
471For a complete set of generated instruction names (in
472.CW 2a
473notation, not Motorola's) see the file
474.CW /sys/src/cmd/2c/2.out.h .
475Despite its name, this file contains an enumeration of the
476instructions that appear in the intermediate files generated
477by the compiler, which correspond exactly to lines of assembly language.
478.PP
479The MC68000 assembler,
480.CW 1a ,
481is essentially the same, honoring the appropriate subset of the instructions
482and addressing modes.
483The definitions of these are, nonetheless, part of
484.CW 2.out.h .
485.SH
486Laying down instructions
487.PP
488The loader modifies the code produced by the assembler and compiler.
489It folds branches,
490copies short sequences of code to eliminate branches,
491and discards unreachable code.
492The first instruction of every function is assumed to be reachable.
493The pseudo-instruction
494.CW NOP ,
495which you may see in compiler output,
496means no instruction at all, rather than an instruction that does nothing.
497The loader discards all
498.CW NOP 's.
499.PP
500To generate a true
501.CW NOP
502instruction, or any other instruction not known to the assembler, use a
503.CW WORD
504pseudo-instruction.
505Such instructions on RISCs are not scheduled by the loader and must have
506their delay slots filled manually.
507.SH
508MIPS
509.PP
510The registers are only addressed by number:
511.CW R0
512through
513.CW R31 .
514.CW R29
515is the stack pointer;
516.CW R30
517is used as the static base pointer, the analogue of
518.CW A6
519on the 68020.
520Its value is the address of the global symbol
521.CW setR30(SB) .
522The register holding returned values from subroutines is
523.CW R1 .
524When a function is called, space for the first argument
525is reserved at
526.CW 0(FP)
527but in C (not Alef) the value is passed in
528.CW R1
529instead.
530.PP
531The loader uses
532.CW R28
533as a temporary.  The system uses
534.CW R26
535and
536.CW R27
537as interrupt-time temporaries.  Therefore none of these registers
538should be used in user code.
539.PP
540The control registers are not known to the assembler.
541Instead they are numbered registers
542.CW M0 ,
543.CW M1 ,
544etc.
545Use this trick to access, say,
546.CW STATUS :
547.P1
548#define	STATUS	12
549	MOVW	M(STATUS), R1
550.P2
551.PP
552Floating point registers are called
553.CW F0
554through
555.CW F31 .
556By convention,
557.CW F24
558must be initialized to the value 0.0,
559.CW F26
560to 0.5,
561.CW F28
562to 1.0, and
563.CW F30
564to 2.0;
565this is done by the operating system.
566.PP
567The instructions and their syntax are different from those of the manufacturer's
568manual.
569There are no
570.CW lui
571and kin; instead there are
572.CW MOVW
573(move word),
574.CW MOVH
575(move halfword),
576and
577.CW MOVB
578(move byte) pseudo-instructions.  If the operand is unsigned, the instructions
579are
580.CW MOVHU
581and
582.CW MOVBU .
583The order of operands is from left to right in dataflow order, just as
584on the 68020 but not as in MIPS documentation.
585This means that the
586.CW Bcond
587instructions are reversed with respect to the book; for example, a
588.CW va
589.CW BGTZ
590generates a MIPS
591.CW bltz
592instruction.
593.PP
594The assembler is for the R2000, R3000, and most of the R4000 and R6000 architectures.
595It understands the 64-bit instructions
596.CW MOVV ,
597.CW MOVVL ,
598.CW ADDV ,
599.CW ADDVU ,
600.CW SUBV ,
601.CW SUBVU ,
602.CW MULV ,
603.CW MULVU ,
604.CW DIVV ,
605.CW DIVVU ,
606.CW SLLV ,
607.CW SRLV ,
608and
609.CW SRAV .
610The assembler does not have any cache, load-linked, or store-conditional instructions.
611.PP
612Some assembler instructions are expanded into multiple instructions by the loader.
613For example the loader may convert the load of a 32 bit constant into an
614.CW lui
615followed by an
616.CW ori .
617.PP
618Assembler instructions should be laid out as if there
619were no load, branch, or floating point compare delay slots;
620the loader will rearrange\(em\f2schedule\f1\(emthe instructions
621to guarantee correctness and improve performance.
622The only exception is that the correct scheduling of instructions
623that use control registers varies from model to model of machine
624(and is often undocumented) so you should schedule such instructions
625by hand to guarantee correct behavior.
626The loader generates
627.P1
628	NOR	R0, R0, R0
629.P2
630when it needs a true no-op instruction.
631Use exactly this instruction when scheduling code manually;
632the loader recognizes it and schedules the code before it and after it independently.  Also,
633.CW WORD
634pseudo-ops are scheduled like no-ops.
635.PP
636The
637.CW NOSCHED
638pseudo-op disables instruction scheduling
639(scheduling is enabled by default);
640.CW SCHED
641re-enables it.
642Branch folding, code copying, and dead code elimination are
643disabled for instructions that are not scheduled.
644.SH
645SPARC
646.PP
647Once you understand the Plan 9 model for the MIPS, the SPARC is familiar.
648Registers have numerical names only:
649.CW R0
650through
651.CW R31 .
652Forget about register windows: Plan 9 doesn't use them at all.
653The machine has 32 global registers, period.
654.CW R1
655[sic] is the stack pointer.
656.CW R2
657is the static base register, with value the address of
658.CW setSB(SB) .
659.CW R7
660is the return register and also the register holding the first
661argument to a C (not Alef) function, again with space reserved at
662.CW 0(FP) .
663.CW R14
664is the loader temporary.
665.PP
666Floating-point registers are exactly as on the MIPS.
667.PP
668The control registers are known by names such as
669.CW FSR .
670The instructions to access these registers are
671.CW MOVW
672instructions, for example
673.P1
674	MOVW	Y, R8
675.P2
676for the SPARC instruction
677.P1
678	rdy	%r8
679.P2
680.PP
681Move instructions are similar to those on the MIPS: pseudo-operations
682that turn into appropriate sequences of
683.CW sethi
684instructions, adds, etc.
685Instructions read from left to right.  Because the arguments are
686flipped to
687.CW SUBCC ,
688the condition codes are not inverted as on the MIPS.
689.PP
690The syntax for the ASI stuff is, for example to move a word from ASI 2:
691.P1
692	MOVW	(R7, 2), R8
693.P2
694The syntax for double indexing is
695.P1
696	MOVW	(R7+R8), R9
697.P2
698.PP
699The SPARC's instruction scheduling is similar to the MIPS's.
700The official no-op instruction is:
701.P1
702	ORN	R0, R0, R0
703.P2
704.SH
705i386
706.PP
707The assembler assumes 32-bit protected mode.
708The register names are
709.CW SP ,
710.CW AX ,
711.CW BX ,
712.CW CX ,
713.CW DX ,
714.CW BP ,
715.CW DI ,
716and
717.CW SI .
718The stack pointer (not a pseudo-register) is
719.CW SP
720and the return register is
721.CW AX .
722There is no physical frame pointer but, as for the MIPS,
723.CW FP
724is a pseudo-register that acts as
725a frame pointer.
726.PP
727Opcode names are mostly the same as those listed in the Intel manual
728with an
729.CW L ,
730.CW W ,
731or
732.CW B
733appended to identify 32-bit,
73416-bit, and 8-bit operations.
735The exceptions are loads, stores, and conditionals.
736All load and store opcodes to and from general registers, special registers
737(such as
738.CW CR0,
739.CW CR3,
740.CW GDTR,
741.CW IDTR,
742.CW SS,
743.CW CS,
744.CW DS,
745.CW ES,
746.CW FS,
747and
748.CW GS )
749or memory are written
750as
751.P1
752	MOV\f2x\fP	src,dst
753.P2
754where
755.I x
756is
757.CW L ,
758.CW W ,
759or
760.CW B .
761Thus to get
762.CW AL
763use a
764.CW MOVB
765instruction.  If you need to access
766.CW AH ,
767you must mention it explicitly in a
768.CW MOVB :
769.P1
770	MOVB	AH, BX
771.P2
772There are many examples of illegal moves, for example,
773.P1
774	MOVB	BP, DI
775.P2
776that the loader actually implements as pseudo-operations.
777.PP
778The names of conditions in all conditional instructions
779.CW J , (
780.CW SET )
781follow the conventions of the 68020 instead of those of the Intel
782assembler:
783.CW JOS ,
784.CW JOC ,
785.CW JCS ,
786.CW JCC ,
787.CW JEQ ,
788.CW JNE ,
789.CW JLS ,
790.CW JHI ,
791.CW JMI ,
792.CW JPL ,
793.CW JPS ,
794.CW JPC ,
795.CW JLT ,
796.CW JGE ,
797.CW JLE ,
798and
799.CW JGT
800instead of
801.CW JO ,
802.CW JNO ,
803.CW JB ,
804.CW JNB ,
805.CW JZ ,
806.CW JNZ ,
807.CW JBE ,
808.CW JNBE ,
809.CW JS ,
810.CW JNS ,
811.CW JP ,
812.CW JNP ,
813.CW JL ,
814.CW JNL ,
815.CW JLE ,
816and
817.CW JNLE .
818.PP
819The addressing modes have syntax like
820.CW AX ,
821.CW (AX) ,
822.CW (AX)(BX*4) ,
823.CW 10(AX) ,
824and
825.CW 10(AX)(BX*4) .
826The offsets from
827.CW AX
828can be replaced by offsets from
829.CW FP
830or
831.CW SB
832to access names, for example
833.CW extern+5(SB)(AX*2) .
834.PP
835Other notes: Non-relative
836.CW JMP
837and
838.CW CALL
839have a
840.CW *
841added to the syntax.
842Only
843.CW LOOP ,
844.CW LOOPEQ ,
845and
846.CW LOOPNE
847are legal loop instructions.  Only
848.CW REP
849and
850.CW REPN
851are recognized repeaters.  These are not prefixes, but rather
852stand-alone opcodes that precede the strings, for example
853.P1
854	CLD; REP; MOVSL
855.P2
856Segment override prefixes in
857.CW MOD/RM
858fields are not supported.
859.SH
860AMD64
861.PP
862The assembler's conventions are similar to those for the 386, above.
863The architecture provides extra fixed-point registers
864.CW R8
865to
866.CW R15 .
867All registers are 64 bit, but instructions access low-order 8, 16 and 32 bits
868as described in the processor handbook.
869For example,
870.CW MOVL
871to
872.CW AX
873puts a value in the low-order 32 bits and clears the top 32 bits to zero.
874Literal operands are limited to signed 32 bit values, which are sign-extended
875to 64 bits in 64 bit operations; the exception is
876.CW MOVQ ,
877which allows 64-bit literals.
878MMX registers are
879.CW M0
880to
881.CW M7 ,
882and
883XMM registers are
884.CW X0
885to
886.CW X15 .
887.PP
888There are many new instructions, including the MMX and XMM media instructions,
889and conditional move instructions.
890As with the 386 instruction names,
891all new 64-bit integer instructions, and the MMX and XMM instructions
892uniformly use
893.CW L
894for `long word' (32 bits) and
895.CW Q
896for `quad word' (64 bits).
897Some instructions use
898.CW O
899(`octword') for 128-bit values, where the processor handbook
900variously uses
901.CW O
902or
903.CW DQ .
904The assembler also consistently uses
905.CW PL
906for `packed long' in
907XMM instructions, instead of
908.CW Q ,
909.CW DQ
910or
911.CW PI .
912Either
913.CW MOVL
914or
915.CW MOVQ
916can be used to move values to and from control registers, even when
917the registers might be 64 bits.
918The assembler often accepts the handbook's name to ease conversion
919of existing code (but remember that the operand order is uniformly
920source then destination).
921.PP
922C's
923.CW "long long"
924type is 64 bits, but passed and returned by value, not by reference.
925More notably, C pointer values are 64 bits, and thus
926.CW "long long"
927and
928.CW "unsigned long long"
929are the only integer types wide enough to hold a pointer value.
930The C compiler and library use the XMM floating-point instructions, not
931the old 387 ones, although the latter are implemented by assembler and loader.
932The compiler provides external registers,
933allocated from
934.CW R15
935down.
936.PP
937The calling conventions are different from the 386.
938.CW CALL
939pushes, and
940.CW RET
941pops a 64-bit return address on the stack.
942The first integer or pointer argument is passed in a register, which is
943.CW BP
944for an integer or pointer (it can be referred to in assembly code by the pseudonym
945.CW RARG ).
946.CW AX
947holds the return value from subroutines as before.
948Floating-point results are returned in
949.CW X0 ,
950although currently the first parameter is not passed in a register if floating-point.
951All parameters less than 8 bytes in length have 8 byte slots reserved on the stack
952to preserve alignment and simplify variable-length argument list access,
953including the first parameter when passed in a register,
954although bytes 4 to 7 are not initialized.
955.PP
956The assembler assumes 64-bit mode unless a
957.CW MODE
958pseudo-operation is given:
959.P1
960	MODE $32
961.P2
962to change to 32-bit mode.
963The effect is mainly to diagnose instructions that are illegal in
964the given mode, but the loader will also assume 32-bit operands and addresses,
965and 32-bit PC values for call and return.
966.SH
967Alpha
968.PP
969On the Alpha, all registers are 64 bits.  The architecture handles 32-bit values
970by giving them a canonical format (sign extension in the case of integer registers).
971Registers are numbered
972.CW R0
973through
974.CW R31 .
975.CW R0
976holds the return value from subroutines, and also the first parameter.
977.CW R30
978is the stack pointer,
979.CW R29
980is the static base,
981.CW R26
982is the link register, and
983.CW R27
984and
985.CW R28
986are linker temporaries.
987.PP
988Floating point registers are numbered
989.CW F0
990to
991.CW F31 .
992.CW F28
993contains
994.CW 0.5 ,
995.CW F29
996contains
997.CW 1.0 ,
998and
999.CW F30
1000contains
1001.CW 2.0 .
1002.CW F31
1003is always
1004.CW 0.0
1005on the Alpha.
1006.PP
1007The extension character for
1008.CW MOV
1009follows DEC's notation:
1010.CW B
1011for byte (8 bits),
1012.CW W
1013for word (16 bits),
1014.CW L
1015for long (32 bits),
1016and
1017.CW Q
1018for quadword (64 bits).
1019Byte and ``word'' loads and stores may be made unsigned
1020by appending a
1021.CW U .
1022.CW S
1023and
1024.CW T
1025refer to IEEE floating point single precision (32 bits) and double precision (64 bits), respectively.
1026.SH
1027PowerPC
1028.PP
1029The PowerPC follows the Plan 9 model set by the MIPS and SPARC,
1030not the elaborate ABIs.
1031The 32-bit instructions of the 60x and 8xx PowerPC architectures are supported;
1032there is no support for the older POWER instructions.
1033Registers are
1034.CW R0
1035through
1036.CW R31 .
1037.CW R0
1038is initialized to zero; this is done by C start up code
1039and assumed by the compiler and loader.
1040.CW R1
1041is the stack pointer.
1042.CW R2
1043is the static base register, with value the address of
1044.CW setSB(SB) .
1045.CW R3
1046is the return register and also the register holding the first
1047argument to a C function, with space reserved at
1048.CW 0(FP)
1049as on the MIPS.
1050.CW R31
1051is the loader temporary.
1052The external registers in Plan 9's C are allocated from
1053.CW R30
1054down.
1055.PP
1056Floating point registers are called
1057.CW F0
1058through
1059.CW F31 .
1060By convention, several registers are initialized
1061to specific values; this is done by the operating system.
1062.CW F27
1063must be initialized to the value
1064.CW 0x4330000080000000
1065(used by float-to-int conversion),
1066.CW F28
1067to the value 0.0,
1068.CW F29
1069to 0.5,
1070.CW F30
1071to 1.0, and
1072.CW F31
1073to 2.0.
1074.PP
1075As on the MIPS and SPARC, the assembler accepts arbitrary literals
1076as operands to
1077.CW MOVW ,
1078and also to
1079.CW ADD
1080and others where `immediate' variants exist,
1081and the loader generates sequences
1082of
1083.CW addi ,
1084.CW addis ,
1085.CW oris ,
1086etc. as required.
1087The register indirect addressing modes use the same syntax as the SPARC,
1088including double indexing when allowed.
1089.PP
1090The instruction names are generally derived from the Motorola ones,
1091subject to slight transformation:
1092the
1093.CW . ' `
1094marking the setting of condition codes is replaced by
1095.CW CC ,
1096and when the letter
1097.CW o ' `
1098represents `OE=1' it is replaced by
1099.CW V .
1100Thus
1101.CW add ,
1102.CW addo.
1103and
1104.CW subfzeo.
1105become
1106.CW ADD ,
1107.CW ADDVCC
1108and
1109.CW SUBFZEVCC .
1110As well as the three-operand conditional branch instruction
1111.CW BC ,
1112the assembler provides pseudo-instructions for the common cases:
1113.CW BEQ ,
1114.CW BNE ,
1115.CW BGT ,
1116.CW BGE ,
1117.CW BLT ,
1118.CW BLE ,
1119.CW BVC ,
1120and
1121.CW BVS .
1122The unconditional branch instruction is
1123.CW BR .
1124Indirect branches use
1125.CW "(CTR)"
1126or
1127.CW "(LR)"
1128as target.
1129.PP
1130Load or store operations are replaced by
1131.CW MOV
1132variants in the usual way:
1133.CW MOVW
1134(move word),
1135.CW MOVH
1136(move halfword with sign extension), and
1137.CW MOVB
1138(move byte with sign extension, a pseudo-instruction),
1139with unsigned variants
1140.CW MOVHZ
1141and
1142.CW MOVBZ ,
1143and byte-reversing
1144.CW MOVWBR
1145and
1146.CW MOVHBR .
1147`Load or store with update' versions are
1148.CW MOVWU ,
1149.CW MOVHU ,
1150and
1151.CW MOVBZU .
1152Load or store multiple is
1153.CW MOVMW .
1154The exceptions are the string instructions, which are
1155.CW LSW
1156and
1157.CW STSW ,
1158and the reservation instructions
1159.CW lwarx
1160and
1161.CW stwcx. ,
1162which are
1163.CW LWAR
1164and
1165.CW STWCCC ,
1166all with operands in the usual data-flow order.
1167Floating-point load or store instructions are
1168.CW FMOVD ,
1169.CW FMOVDU ,
1170.CW FMOVS ,
1171and
1172.CW FMOVSU .
1173The register to register move instructions
1174.CW fmr
1175and
1176.CW fmr.
1177are written
1178.CW FMOVD
1179and
1180.CW FMOVDCC .
1181.PP
1182The assembler knows the commonly used special purpose registers:
1183.CW CR ,
1184.CW CTR ,
1185.CW DEC ,
1186.CW LR ,
1187.CW MSR ,
1188and
1189.CW XER .
1190The rest, which are often architecture-dependent, are referenced as
1191.CW SPR(n) .
1192The segment registers of the 60x series are similarly
1193.CW SEG(n) ,
1194but
1195.I n
1196can also be a register name, as in
1197.CW SEG(R3) .
1198Moves between special purpose registers and general purpose ones,
1199when allowed by the architecture,
1200are written as
1201.CW MOVW ,
1202replacing
1203.CW mfcr ,
1204.CW mtcr ,
1205.CW mfmsr ,
1206.CW mtmsr ,
1207.CW mtspr ,
1208.CW mfspr ,
1209.CW mftb ,
1210and many others.
1211.PP
1212The fields of the condition register
1213.CW CR
1214are referenced as
1215.CW CR(0)
1216through
1217.CW CR(7) .
1218They are used by the
1219.CW MOVFL
1220(move field) pseudo-instruction,
1221which produces
1222.CW mcrf
1223or
1224.CW mtcrf .
1225For example:
1226.P1
1227	MOVFL	CR(3), CR(0)
1228	MOVFL	R3, CR(1)
1229	MOVFL	R3, $7, CR
1230.P2
1231They are also accepted in
1232the conditional branch instruction, for example
1233.P1
1234	BEQ	CR(7), label
1235.P2
1236Fields of the
1237.CW FPSCR
1238are accessed using
1239.CW MOVFL
1240in a similar way:
1241.P1
1242	MOVFL	FPSCR, F0
1243	MOVFL	F0, FPSCR
1244	MOVFL	F0, $7, FPSCR
1245	MOVFL	$0, FPSCR(3)
1246.P2
1247producing
1248.CW mffs ,
1249.CW mtfsf ,
1250or
1251.CW mtfsfi
1252as appropriate.
1253.SH
1254ARM
1255.PP
1256The assembler provides access to
1257.CW R0
1258through
1259.CW R14
1260and the
1261.CW PC .
1262The stack pointer is
1263.CW R13 ,
1264the link register is
1265.CW R14 ,
1266and the static base register is
1267.CW R12 .
1268.CW R0
1269is the return register and also the register holding
1270the first argument to a subroutine.
1271The assembler supports the
1272.CW CPSR
1273and
1274.CW SPSR
1275registers.
1276It also knows about coprocessor registers
1277.CW C0
1278through
1279.CW C15 .
1280Floating registers are
1281.CW F0
1282through
1283.CW F7 ,
1284.CW FPSR
1285and
1286.CW FPCR .
1287.PP
1288As with the other architectures, loads and stores are called
1289.CW MOV ,
1290e.g.
1291.CW MOVW
1292for load word or store word, and
1293.CW MOVM
1294for
1295load or store multiple,
1296depending on the operands.
1297.PP
1298Addressing modes are supported by suffixes to the instructions:
1299.CW .IA
1300(increment after),
1301.CW .IB
1302(increment before),
1303.CW .DA
1304(decrement after), and
1305.CW .DB
1306(decrement before).
1307These can only be used with the
1308.CW MOV
1309instructions.
1310The move multiple instruction,
1311.CW MOVM ,
1312defines a range of registers using brackets, e.g.
1313.CW [R0-R12] .
1314The special
1315.CW MOVM
1316addressing mode bits
1317.CW W ,
1318.CW U ,
1319and
1320.CW P
1321are written in the same manner, for example,
1322.CW MOVM.DB.W .
1323A
1324.CW .S
1325suffix allows a
1326.CW MOVM
1327instruction to access user
1328.CW R13
1329and
1330.CW R14
1331when in another processor mode.
1332Shifts and rotates in addressing modes are supported by binary operators
1333.CW <<
1334(logical left shift),
1335.CW >>
1336(logical right shift),
1337.CW ->
1338(arithmetic right shift), and
1339.CW @>
1340(rotate right); for example
1341.CW "R7>>R2" or
1342.CW "R2@>2" .
1343The assembler does not support indexing by a shifted expression;
1344only names can be doubly indexed.
1345.PP
1346Any instruction can be followed by a suffix that makes the instruction conditional:
1347.CW .EQ ,
1348.CW .NE ,
1349and so on, as in the ARM manual, with synonyms
1350.CW .HS
1351(for
1352.CW .CS )
1353and
1354.CW .LO
1355(for
1356.CW .CC ),
1357for example
1358.CW ADD.NE .
1359Arithmetic
1360and logical instructions
1361can have a
1362.CW .S
1363suffix, as ARM allows, to set condition codes.
1364.PP
1365The syntax of the
1366.CW MCR
1367and
1368.CW MRC
1369coprocessor instructions is largely as in the manual, with the usual adjustments.
1370The assembler directly supports only the ARM floating-point coprocessor
1371operations used by the compiler:
1372.CW CMP ,
1373.CW ADD ,
1374.CW SUB ,
1375.CW MUL ,
1376and
1377.CW DIV ,
1378all with
1379.CW F
1380or
1381.CW D
1382suffix selecting single or double precision.
1383Floating-point load or store become
1384.CW MOVF
1385and
1386.CW MOVD .
1387Conversion instructions are also specified by moves:
1388.CW MOVWD ,
1389.CW MOVWF ,
1390.CW MOVDW ,
1391.CW MOVWD ,
1392.CW MOVFD ,
1393and
1394.CW MOVDF .
1395