1@c Copyright (C) 1991-2022 Free Software Foundation, Inc. 2@c This is part of the GAS manual. 3@c For copying conditions, see the file as.texinfo. 4@c man end 5 6@ifset GENERIC 7@page 8@node i386-Dependent 9@chapter 80386 Dependent Features 10@end ifset 11@ifclear GENERIC 12@node Machine Dependencies 13@chapter 80386 Dependent Features 14@end ifclear 15 16@cindex i386 support 17@cindex i80386 support 18@cindex x86-64 support 19 20The i386 version @code{@value{AS}} supports both the original Intel 386 21architecture in both 16 and 32-bit mode as well as AMD x86-64 architecture 22extending the Intel architecture to 64-bits. 23 24@menu 25* i386-Options:: Options 26* i386-Directives:: X86 specific directives 27* i386-Syntax:: Syntactical considerations 28* i386-Mnemonics:: Instruction Naming 29* i386-Regs:: Register Naming 30* i386-Prefixes:: Instruction Prefixes 31* i386-Memory:: Memory References 32* i386-Jumps:: Handling of Jump Instructions 33* i386-Float:: Floating Point 34* i386-SIMD:: Intel's MMX and AMD's 3DNow! SIMD Operations 35* i386-LWP:: AMD's Lightweight Profiling Instructions 36* i386-BMI:: Bit Manipulation Instruction 37* i386-TBM:: AMD's Trailing Bit Manipulation Instructions 38* i386-16bit:: Writing 16-bit Code 39* i386-Arch:: Specifying an x86 CPU architecture 40* i386-ISA:: AMD64 ISA vs. Intel64 ISA 41* i386-Bugs:: AT&T Syntax bugs 42* i386-Notes:: Notes 43@end menu 44 45@node i386-Options 46@section Options 47 48@cindex options for i386 49@cindex options for x86-64 50@cindex i386 options 51@cindex x86-64 options 52 53The i386 version of @code{@value{AS}} has a few machine 54dependent options: 55 56@c man begin OPTIONS 57@table @gcctabopt 58@cindex @samp{--32} option, i386 59@cindex @samp{--32} option, x86-64 60@cindex @samp{--x32} option, i386 61@cindex @samp{--x32} option, x86-64 62@cindex @samp{--64} option, i386 63@cindex @samp{--64} option, x86-64 64@item --32 | --x32 | --64 65Select the word size, either 32 bits or 64 bits. @samp{--32} 66implies Intel i386 architecture, while @samp{--x32} and @samp{--64} 67imply AMD x86-64 architecture with 32-bit or 64-bit word-size 68respectively. 69 70These options are only available with the ELF object file format, and 71require that the necessary BFD support has been included (on a 32-bit 72platform you have to add --enable-64-bit-bfd to configure enable 64-bit 73usage and use x86-64 as target platform). 74 75@item -n 76By default, x86 GAS replaces multiple nop instructions used for 77alignment within code sections with multi-byte nop instructions such 78as leal 0(%esi,1),%esi. This switch disables the optimization if a single 79byte nop (0x90) is explicitly specified as the fill byte for alignment. 80 81@cindex @samp{--divide} option, i386 82@item --divide 83On SVR4-derived platforms, the character @samp{/} is treated as a comment 84character, which means that it cannot be used in expressions. The 85@samp{--divide} option turns @samp{/} into a normal character. This does 86not disable @samp{/} at the beginning of a line starting a comment, or 87affect using @samp{#} for starting a comment. 88 89@cindex @samp{-march=} option, i386 90@cindex @samp{-march=} option, x86-64 91@item -march=@var{CPU}[+@var{EXTENSION}@dots{}] 92This option specifies the target processor. The assembler will 93issue an error message if an attempt is made to assemble an instruction 94which will not execute on the target processor. The following 95processor names are recognized: 96@code{i8086}, 97@code{i186}, 98@code{i286}, 99@code{i386}, 100@code{i486}, 101@code{i586}, 102@code{i686}, 103@code{pentium}, 104@code{pentiumpro}, 105@code{pentiumii}, 106@code{pentiumiii}, 107@code{pentium4}, 108@code{prescott}, 109@code{nocona}, 110@code{core}, 111@code{core2}, 112@code{corei7}, 113@code{iamcu}, 114@code{k6}, 115@code{k6_2}, 116@code{athlon}, 117@code{opteron}, 118@code{k8}, 119@code{amdfam10}, 120@code{bdver1}, 121@code{bdver2}, 122@code{bdver3}, 123@code{bdver4}, 124@code{znver1}, 125@code{znver2}, 126@code{znver3}, 127@code{btver1}, 128@code{btver2}, 129@code{generic32} and 130@code{generic64}. 131 132In addition to the basic instruction set, the assembler can be told to 133accept various extension mnemonics. For example, 134@code{-march=i686+sse4+vmx} extends @var{i686} with @var{sse4} and 135@var{vmx}. The following extensions are currently supported: 136@code{8087}, 137@code{287}, 138@code{387}, 139@code{687}, 140@code{no87}, 141@code{no287}, 142@code{no387}, 143@code{no687}, 144@code{cmov}, 145@code{nocmov}, 146@code{fxsr}, 147@code{nofxsr}, 148@code{mmx}, 149@code{nommx}, 150@code{sse}, 151@code{sse2}, 152@code{sse3}, 153@code{sse4a}, 154@code{ssse3}, 155@code{sse4.1}, 156@code{sse4.2}, 157@code{sse4}, 158@code{nosse}, 159@code{nosse2}, 160@code{nosse3}, 161@code{nosse4a}, 162@code{nossse3}, 163@code{nosse4.1}, 164@code{nosse4.2}, 165@code{nosse4}, 166@code{avx}, 167@code{avx2}, 168@code{noavx}, 169@code{noavx2}, 170@code{adx}, 171@code{rdseed}, 172@code{prfchw}, 173@code{smap}, 174@code{mpx}, 175@code{sha}, 176@code{rdpid}, 177@code{ptwrite}, 178@code{cet}, 179@code{gfni}, 180@code{vaes}, 181@code{vpclmulqdq}, 182@code{prefetchwt1}, 183@code{clflushopt}, 184@code{se1}, 185@code{clwb}, 186@code{movdiri}, 187@code{movdir64b}, 188@code{enqcmd}, 189@code{serialize}, 190@code{tsxldtrk}, 191@code{kl}, 192@code{nokl}, 193@code{widekl}, 194@code{nowidekl}, 195@code{hreset}, 196@code{avx512f}, 197@code{avx512cd}, 198@code{avx512er}, 199@code{avx512pf}, 200@code{avx512vl}, 201@code{avx512bw}, 202@code{avx512dq}, 203@code{avx512ifma}, 204@code{avx512vbmi}, 205@code{avx512_4fmaps}, 206@code{avx512_4vnniw}, 207@code{avx512_vpopcntdq}, 208@code{avx512_vbmi2}, 209@code{avx512_vnni}, 210@code{avx512_bitalg}, 211@code{avx512_vp2intersect}, 212@code{tdx}, 213@code{avx512_bf16}, 214@code{avx_vnni}, 215@code{avx512_fp16}, 216@code{noavx512f}, 217@code{noavx512cd}, 218@code{noavx512er}, 219@code{noavx512pf}, 220@code{noavx512vl}, 221@code{noavx512bw}, 222@code{noavx512dq}, 223@code{noavx512ifma}, 224@code{noavx512vbmi}, 225@code{noavx512_4fmaps}, 226@code{noavx512_4vnniw}, 227@code{noavx512_vpopcntdq}, 228@code{noavx512_vbmi2}, 229@code{noavx512_vnni}, 230@code{noavx512_bitalg}, 231@code{noavx512_vp2intersect}, 232@code{notdx}, 233@code{noavx512_bf16}, 234@code{noavx_vnni}, 235@code{noavx512_fp16}, 236@code{noenqcmd}, 237@code{noserialize}, 238@code{notsxldtrk}, 239@code{amx_int8}, 240@code{noamx_int8}, 241@code{amx_bf16}, 242@code{noamx_bf16}, 243@code{amx_tile}, 244@code{noamx_tile}, 245@code{nouintr}, 246@code{nohreset}, 247@code{vmx}, 248@code{vmfunc}, 249@code{smx}, 250@code{xsave}, 251@code{xsaveopt}, 252@code{xsavec}, 253@code{xsaves}, 254@code{aes}, 255@code{pclmul}, 256@code{fsgsbase}, 257@code{rdrnd}, 258@code{f16c}, 259@code{bmi2}, 260@code{fma}, 261@code{movbe}, 262@code{ept}, 263@code{lzcnt}, 264@code{popcnt}, 265@code{hle}, 266@code{rtm}, 267@code{invpcid}, 268@code{clflush}, 269@code{mwaitx}, 270@code{clzero}, 271@code{wbnoinvd}, 272@code{pconfig}, 273@code{waitpkg}, 274@code{uintr}, 275@code{cldemote}, 276@code{rdpru}, 277@code{mcommit}, 278@code{sev_es}, 279@code{lwp}, 280@code{fma4}, 281@code{xop}, 282@code{cx16}, 283@code{syscall}, 284@code{rdtscp}, 285@code{3dnow}, 286@code{3dnowa}, 287@code{sse4a}, 288@code{sse5}, 289@code{snp}, 290@code{invlpgb}, 291@code{tlbsync}, 292@code{svme} and 293@code{padlock}. 294Note that rather than extending a basic instruction set, the extension 295mnemonics starting with @code{no} revoke the respective functionality. 296 297When the @code{.arch} directive is used with @option{-march}, the 298@code{.arch} directive will take precedent. 299 300@cindex @samp{-mtune=} option, i386 301@cindex @samp{-mtune=} option, x86-64 302@item -mtune=@var{CPU} 303This option specifies a processor to optimize for. When used in 304conjunction with the @option{-march} option, only instructions 305of the processor specified by the @option{-march} option will be 306generated. 307 308Valid @var{CPU} values are identical to the processor list of 309@option{-march=@var{CPU}}. 310 311@cindex @samp{-msse2avx} option, i386 312@cindex @samp{-msse2avx} option, x86-64 313@item -msse2avx 314This option specifies that the assembler should encode SSE instructions 315with VEX prefix. 316 317@cindex @samp{-muse-unaligned-vector-move} option, i386 318@cindex @samp{-muse-unaligned-vector-move} option, x86-64 319@item -muse-unaligned-vector-move 320This option specifies that the assembler should encode aligned vector 321move as unaligned vector move. 322 323@cindex @samp{-msse-check=} option, i386 324@cindex @samp{-msse-check=} option, x86-64 325@item -msse-check=@var{none} 326@itemx -msse-check=@var{warning} 327@itemx -msse-check=@var{error} 328These options control if the assembler should check SSE instructions. 329@option{-msse-check=@var{none}} will make the assembler not to check SSE 330instructions, which is the default. @option{-msse-check=@var{warning}} 331will make the assembler issue a warning for any SSE instruction. 332@option{-msse-check=@var{error}} will make the assembler issue an error 333for any SSE instruction. 334 335@cindex @samp{-mavxscalar=} option, i386 336@cindex @samp{-mavxscalar=} option, x86-64 337@item -mavxscalar=@var{128} 338@itemx -mavxscalar=@var{256} 339These options control how the assembler should encode scalar AVX 340instructions. @option{-mavxscalar=@var{128}} will encode scalar 341AVX instructions with 128bit vector length, which is the default. 342@option{-mavxscalar=@var{256}} will encode scalar AVX instructions 343with 256bit vector length. 344 345WARNING: Don't use this for production code - due to CPU errata the 346resulting code may not work on certain models. 347 348@cindex @samp{-mvexwig=} option, i386 349@cindex @samp{-mvexwig=} option, x86-64 350@item -mvexwig=@var{0} 351@itemx -mvexwig=@var{1} 352These options control how the assembler should encode VEX.W-ignored (WIG) 353VEX instructions. @option{-mvexwig=@var{0}} will encode WIG VEX 354instructions with vex.w = 0, which is the default. 355@option{-mvexwig=@var{1}} will encode WIG EVEX instructions with 356vex.w = 1. 357 358WARNING: Don't use this for production code - due to CPU errata the 359resulting code may not work on certain models. 360 361@cindex @samp{-mevexlig=} option, i386 362@cindex @samp{-mevexlig=} option, x86-64 363@item -mevexlig=@var{128} 364@itemx -mevexlig=@var{256} 365@itemx -mevexlig=@var{512} 366These options control how the assembler should encode length-ignored 367(LIG) EVEX instructions. @option{-mevexlig=@var{128}} will encode LIG 368EVEX instructions with 128bit vector length, which is the default. 369@option{-mevexlig=@var{256}} and @option{-mevexlig=@var{512}} will 370encode LIG EVEX instructions with 256bit and 512bit vector length, 371respectively. 372 373@cindex @samp{-mevexwig=} option, i386 374@cindex @samp{-mevexwig=} option, x86-64 375@item -mevexwig=@var{0} 376@itemx -mevexwig=@var{1} 377These options control how the assembler should encode w-ignored (WIG) 378EVEX instructions. @option{-mevexwig=@var{0}} will encode WIG 379EVEX instructions with evex.w = 0, which is the default. 380@option{-mevexwig=@var{1}} will encode WIG EVEX instructions with 381evex.w = 1. 382 383@cindex @samp{-mmnemonic=} option, i386 384@cindex @samp{-mmnemonic=} option, x86-64 385@item -mmnemonic=@var{att} 386@itemx -mmnemonic=@var{intel} 387This option specifies instruction mnemonic for matching instructions. 388The @code{.att_mnemonic} and @code{.intel_mnemonic} directives will 389take precedent. 390 391@cindex @samp{-msyntax=} option, i386 392@cindex @samp{-msyntax=} option, x86-64 393@item -msyntax=@var{att} 394@itemx -msyntax=@var{intel} 395This option specifies instruction syntax when processing instructions. 396The @code{.att_syntax} and @code{.intel_syntax} directives will 397take precedent. 398 399@cindex @samp{-mnaked-reg} option, i386 400@cindex @samp{-mnaked-reg} option, x86-64 401@item -mnaked-reg 402This option specifies that registers don't require a @samp{%} prefix. 403The @code{.att_syntax} and @code{.intel_syntax} directives will take precedent. 404 405@cindex @samp{-madd-bnd-prefix} option, i386 406@cindex @samp{-madd-bnd-prefix} option, x86-64 407@item -madd-bnd-prefix 408This option forces the assembler to add BND prefix to all branches, even 409if such prefix was not explicitly specified in the source code. 410 411@cindex @samp{-mshared} option, i386 412@cindex @samp{-mshared} option, x86-64 413@item -mno-shared 414On ELF target, the assembler normally optimizes out non-PLT relocations 415against defined non-weak global branch targets with default visibility. 416The @samp{-mshared} option tells the assembler to generate code which 417may go into a shared library where all non-weak global branch targets 418with default visibility can be preempted. The resulting code is 419slightly bigger. This option only affects the handling of branch 420instructions. 421 422@cindex @samp{-mbig-obj} option, i386 423@cindex @samp{-mbig-obj} option, x86-64 424@item -mbig-obj 425On PE/COFF target this option forces the use of big object file 426format, which allows more than 32768 sections. 427 428@cindex @samp{-momit-lock-prefix=} option, i386 429@cindex @samp{-momit-lock-prefix=} option, x86-64 430@item -momit-lock-prefix=@var{no} 431@itemx -momit-lock-prefix=@var{yes} 432These options control how the assembler should encode lock prefix. 433This option is intended as a workaround for processors, that fail on 434lock prefix. This option can only be safely used with single-core, 435single-thread computers 436@option{-momit-lock-prefix=@var{yes}} will omit all lock prefixes. 437@option{-momit-lock-prefix=@var{no}} will encode lock prefix as usual, 438which is the default. 439 440@cindex @samp{-mfence-as-lock-add=} option, i386 441@cindex @samp{-mfence-as-lock-add=} option, x86-64 442@item -mfence-as-lock-add=@var{no} 443@itemx -mfence-as-lock-add=@var{yes} 444These options control how the assembler should encode lfence, mfence and 445sfence. 446@option{-mfence-as-lock-add=@var{yes}} will encode lfence, mfence and 447sfence as @samp{lock addl $0x0, (%rsp)} in 64-bit mode and 448@samp{lock addl $0x0, (%esp)} in 32-bit mode. 449@option{-mfence-as-lock-add=@var{no}} will encode lfence, mfence and 450sfence as usual, which is the default. 451 452@cindex @samp{-mrelax-relocations=} option, i386 453@cindex @samp{-mrelax-relocations=} option, x86-64 454@item -mrelax-relocations=@var{no} 455@itemx -mrelax-relocations=@var{yes} 456These options control whether the assembler should generate relax 457relocations, R_386_GOT32X, in 32-bit mode, or R_X86_64_GOTPCRELX and 458R_X86_64_REX_GOTPCRELX, in 64-bit mode. 459@option{-mrelax-relocations=@var{yes}} will generate relax relocations. 460@option{-mrelax-relocations=@var{no}} will not generate relax 461relocations. The default can be controlled by a configure option 462@option{--enable-x86-relax-relocations}. 463 464@cindex @samp{-malign-branch-boundary=} option, i386 465@cindex @samp{-malign-branch-boundary=} option, x86-64 466@item -malign-branch-boundary=@var{NUM} 467This option controls how the assembler should align branches with segment 468prefixes or NOP. @var{NUM} must be a power of 2. It should be 0 or 469no less than 16. Branches will be aligned within @var{NUM} byte 470boundary. @option{-malign-branch-boundary=0}, which is the default, 471doesn't align branches. 472 473@cindex @samp{-malign-branch=} option, i386 474@cindex @samp{-malign-branch=} option, x86-64 475@item -malign-branch=@var{TYPE}[+@var{TYPE}...] 476This option specifies types of branches to align. @var{TYPE} is 477combination of @samp{jcc}, which aligns conditional jumps, 478@samp{fused}, which aligns fused conditional jumps, @samp{jmp}, 479which aligns unconditional jumps, @samp{call} which aligns calls, 480@samp{ret}, which aligns rets, @samp{indirect}, which aligns indirect 481jumps and calls. The default is @option{-malign-branch=jcc+fused+jmp}. 482 483@cindex @samp{-malign-branch-prefix-size=} option, i386 484@cindex @samp{-malign-branch-prefix-size=} option, x86-64 485@item -malign-branch-prefix-size=@var{NUM} 486This option specifies the maximum number of prefixes on an instruction 487to align branches. @var{NUM} should be between 0 and 5. The default 488@var{NUM} is 5. 489 490@cindex @samp{-mbranches-within-32B-boundaries} option, i386 491@cindex @samp{-mbranches-within-32B-boundaries} option, x86-64 492@item -mbranches-within-32B-boundaries 493This option aligns conditional jumps, fused conditional jumps and 494unconditional jumps within 32 byte boundary with up to 5 segment prefixes 495on an instruction. It is equivalent to 496@option{-malign-branch-boundary=32} 497@option{-malign-branch=jcc+fused+jmp} 498@option{-malign-branch-prefix-size=5}. 499The default doesn't align branches. 500 501@cindex @samp{-mlfence-after-load=} option, i386 502@cindex @samp{-mlfence-after-load=} option, x86-64 503@item -mlfence-after-load=@var{no} 504@itemx -mlfence-after-load=@var{yes} 505These options control whether the assembler should generate lfence 506after load instructions. @option{-mlfence-after-load=@var{yes}} will 507generate lfence. @option{-mlfence-after-load=@var{no}} will not generate 508lfence, which is the default. 509 510@cindex @samp{-mlfence-before-indirect-branch=} option, i386 511@cindex @samp{-mlfence-before-indirect-branch=} option, x86-64 512@item -mlfence-before-indirect-branch=@var{none} 513@item -mlfence-before-indirect-branch=@var{all} 514@item -mlfence-before-indirect-branch=@var{register} 515@itemx -mlfence-before-indirect-branch=@var{memory} 516These options control whether the assembler should generate lfence 517before indirect near branch instructions. 518@option{-mlfence-before-indirect-branch=@var{all}} will generate lfence 519before indirect near branch via register and issue a warning before 520indirect near branch via memory. 521It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when 522there's no explicit @option{-mlfence-before-ret=}. 523@option{-mlfence-before-indirect-branch=@var{register}} will generate 524lfence before indirect near branch via register. 525@option{-mlfence-before-indirect-branch=@var{memory}} will issue a 526warning before indirect near branch via memory. 527@option{-mlfence-before-indirect-branch=@var{none}} will not generate 528lfence nor issue warning, which is the default. Note that lfence won't 529be generated before indirect near branch via register with 530@option{-mlfence-after-load=@var{yes}} since lfence will be generated 531after loading branch target register. 532 533@cindex @samp{-mlfence-before-ret=} option, i386 534@cindex @samp{-mlfence-before-ret=} option, x86-64 535@item -mlfence-before-ret=@var{none} 536@item -mlfence-before-ret=@var{shl} 537@item -mlfence-before-ret=@var{or} 538@item -mlfence-before-ret=@var{yes} 539@itemx -mlfence-before-ret=@var{not} 540These options control whether the assembler should generate lfence 541before ret. @option{-mlfence-before-ret=@var{or}} will generate 542generate or instruction with lfence. 543@option{-mlfence-before-ret=@var{shl/yes}} will generate shl instruction 544with lfence. @option{-mlfence-before-ret=@var{not}} will generate not 545instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not 546generate lfence, which is the default. 547 548@cindex @samp{-mx86-used-note=} option, i386 549@cindex @samp{-mx86-used-note=} option, x86-64 550@item -mx86-used-note=@var{no} 551@itemx -mx86-used-note=@var{yes} 552These options control whether the assembler should generate 553GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED 554GNU property notes. The default can be controlled by the 555@option{--enable-x86-used-note} configure option. 556 557@cindex @samp{-mevexrcig=} option, i386 558@cindex @samp{-mevexrcig=} option, x86-64 559@item -mevexrcig=@var{rne} 560@itemx -mevexrcig=@var{rd} 561@itemx -mevexrcig=@var{ru} 562@itemx -mevexrcig=@var{rz} 563These options control how the assembler should encode SAE-only 564EVEX instructions. @option{-mevexrcig=@var{rne}} will encode RC bits 565of EVEX instruction with 00, which is the default. 566@option{-mevexrcig=@var{rd}}, @option{-mevexrcig=@var{ru}} 567and @option{-mevexrcig=@var{rz}} will encode SAE-only EVEX instructions 568with 01, 10 and 11 RC bits, respectively. 569 570@cindex @samp{-mamd64} option, x86-64 571@cindex @samp{-mintel64} option, x86-64 572@item -mamd64 573@itemx -mintel64 574This option specifies that the assembler should accept only AMD64 or 575Intel64 ISA in 64-bit mode. The default is to accept common, Intel64 576only and AMD64 ISAs. 577 578@cindex @samp{-O0} option, i386 579@cindex @samp{-O0} option, x86-64 580@cindex @samp{-O} option, i386 581@cindex @samp{-O} option, x86-64 582@cindex @samp{-O1} option, i386 583@cindex @samp{-O1} option, x86-64 584@cindex @samp{-O2} option, i386 585@cindex @samp{-O2} option, x86-64 586@cindex @samp{-Os} option, i386 587@cindex @samp{-Os} option, x86-64 588@item -O0 | -O | -O1 | -O2 | -Os 589Optimize instruction encoding with smaller instruction size. @samp{-O} 590and @samp{-O1} encode 64-bit register load instructions with 64-bit 591immediate as 32-bit register load instructions with 31-bit or 32-bits 592immediates, encode 64-bit register clearing instructions with 32-bit 593register clearing instructions, encode 256-bit/512-bit VEX/EVEX vector 594register clearing instructions with 128-bit VEX vector register 595clearing instructions, encode 128-bit/256-bit EVEX vector 596register load/store instructions with VEX vector register load/store 597instructions, and encode 128-bit/256-bit EVEX packed integer logical 598instructions with 128-bit/256-bit VEX packed integer logical. 599 600@samp{-O2} includes @samp{-O1} optimization plus encodes 601256-bit/512-bit EVEX vector register clearing instructions with 128-bit 602EVEX vector register clearing instructions. In 64-bit mode VEX encoded 603instructions with commutative source operands will also have their 604source operands swapped if this allows using the 2-byte VEX prefix form 605instead of the 3-byte one. Certain forms of AND as well as OR with the 606same (register) operand specified twice will also be changed to TEST. 607 608@samp{-Os} includes @samp{-O2} optimization plus encodes 16-bit, 32-bit 609and 64-bit register tests with immediate as 8-bit register test with 610immediate. @samp{-O0} turns off this optimization. 611 612@end table 613@c man end 614 615@node i386-Directives 616@section x86 specific Directives 617 618@cindex machine directives, x86 619@cindex x86 machine directives 620@table @code 621 622@cindex @code{lcomm} directive, COFF 623@item .lcomm @var{symbol} , @var{length}[, @var{alignment}] 624Reserve @var{length} (an absolute expression) bytes for a local common 625denoted by @var{symbol}. The section and value of @var{symbol} are 626those of the new local common. The addresses are allocated in the bss 627section, so that at run-time the bytes start off zeroed. Since 628@var{symbol} is not declared global, it is normally not visible to 629@code{@value{LD}}. The optional third parameter, @var{alignment}, 630specifies the desired alignment of the symbol in the bss section. 631 632This directive is only available for COFF based x86 targets. 633 634@cindex @code{largecomm} directive, ELF 635@item .largecomm @var{symbol} , @var{length}[, @var{alignment}] 636This directive behaves in the same way as the @code{comm} directive 637except that the data is placed into the @var{.lbss} section instead of 638the @var{.bss} section @ref{Comm}. 639 640The directive is intended to be used for data which requires a large 641amount of space, and it is only available for ELF based x86_64 642targets. 643 644@cindex @code{value} directive 645@item .value @var{expression} [, @var{expression}] 646This directive behaves in the same way as the @code{.short} directive, 647taking a series of comma separated expressions and storing them as 648two-byte wide values into the current section. 649 650@c FIXME: Document other x86 specific directives ? Eg: .code16gcc, 651 652@end table 653 654@node i386-Syntax 655@section i386 Syntactical Considerations 656@menu 657* i386-Variations:: AT&T Syntax versus Intel Syntax 658* i386-Chars:: Special Characters 659@end menu 660 661@node i386-Variations 662@subsection AT&T Syntax versus Intel Syntax 663 664@cindex i386 intel_syntax pseudo op 665@cindex intel_syntax pseudo op, i386 666@cindex i386 att_syntax pseudo op 667@cindex att_syntax pseudo op, i386 668@cindex i386 syntax compatibility 669@cindex syntax compatibility, i386 670@cindex x86-64 intel_syntax pseudo op 671@cindex intel_syntax pseudo op, x86-64 672@cindex x86-64 att_syntax pseudo op 673@cindex att_syntax pseudo op, x86-64 674@cindex x86-64 syntax compatibility 675@cindex syntax compatibility, x86-64 676 677@code{@value{AS}} now supports assembly using Intel assembler syntax. 678@code{.intel_syntax} selects Intel mode, and @code{.att_syntax} switches 679back to the usual AT&T mode for compatibility with the output of 680@code{@value{GCC}}. Either of these directives may have an optional 681argument, @code{prefix}, or @code{noprefix} specifying whether registers 682require a @samp{%} prefix. AT&T System V/386 assembler syntax is quite 683different from Intel syntax. We mention these differences because 684almost all 80386 documents use Intel syntax. Notable differences 685between the two syntaxes are: 686 687@cindex immediate operands, i386 688@cindex i386 immediate operands 689@cindex register operands, i386 690@cindex i386 register operands 691@cindex jump/call operands, i386 692@cindex i386 jump/call operands 693@cindex operand delimiters, i386 694 695@cindex immediate operands, x86-64 696@cindex x86-64 immediate operands 697@cindex register operands, x86-64 698@cindex x86-64 register operands 699@cindex jump/call operands, x86-64 700@cindex x86-64 jump/call operands 701@cindex operand delimiters, x86-64 702@itemize @bullet 703@item 704AT&T immediate operands are preceded by @samp{$}; Intel immediate 705operands are undelimited (Intel @samp{push 4} is AT&T @samp{pushl $4}). 706AT&T register operands are preceded by @samp{%}; Intel register operands 707are undelimited. AT&T absolute (as opposed to PC relative) jump/call 708operands are prefixed by @samp{*}; they are undelimited in Intel syntax. 709 710@cindex i386 source, destination operands 711@cindex source, destination operands; i386 712@cindex x86-64 source, destination operands 713@cindex source, destination operands; x86-64 714@item 715AT&T and Intel syntax use the opposite order for source and destination 716operands. Intel @samp{add eax, 4} is @samp{addl $4, %eax}. The 717@samp{source, dest} convention is maintained for compatibility with 718previous Unix assemblers. Note that @samp{bound}, @samp{invlpga}, and 719instructions with 2 immediate operands, such as the @samp{enter} 720instruction, do @emph{not} have reversed order. @ref{i386-Bugs}. 721 722@cindex mnemonic suffixes, i386 723@cindex sizes operands, i386 724@cindex i386 size suffixes 725@cindex mnemonic suffixes, x86-64 726@cindex sizes operands, x86-64 727@cindex x86-64 size suffixes 728@item 729In AT&T syntax the size of memory operands is determined from the last 730character of the instruction mnemonic. Mnemonic suffixes of @samp{b}, 731@samp{w}, @samp{l} and @samp{q} specify byte (8-bit), word (16-bit), long 732(32-bit) and quadruple word (64-bit) memory references. Mnemonic suffixes 733of @samp{x}, @samp{y} and @samp{z} specify xmm (128-bit vector), ymm 734(256-bit vector) and zmm (512-bit vector) memory references, only when there's 735no other way to disambiguate an instruction. Intel syntax accomplishes this by 736prefixing memory operands (@emph{not} the instruction mnemonics) with 737@samp{byte ptr}, @samp{word ptr}, @samp{dword ptr}, @samp{qword ptr}, 738@samp{xmmword ptr}, @samp{ymmword ptr} and @samp{zmmword ptr}. Thus, Intel 739syntax @samp{mov al, byte ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T 740syntax. In Intel syntax, @samp{fword ptr}, @samp{tbyte ptr} and 741@samp{oword ptr} specify 48-bit, 80-bit and 128-bit memory references. 742 743In 64-bit code, @samp{movabs} can be used to encode the @samp{mov} 744instruction with the 64-bit displacement or immediate operand. 745 746@cindex return instructions, i386 747@cindex i386 jump, call, return 748@cindex return instructions, x86-64 749@cindex x86-64 jump, call, return 750@item 751Immediate form long jumps and calls are 752@samp{lcall/ljmp $@var{section}, $@var{offset}} in AT&T syntax; the 753Intel syntax is 754@samp{call/jmp far @var{section}:@var{offset}}. Also, the far return 755instruction 756is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is 757@samp{ret far @var{stack-adjust}}. 758 759@cindex sections, i386 760@cindex i386 sections 761@cindex sections, x86-64 762@cindex x86-64 sections 763@item 764The AT&T assembler does not provide support for multiple section 765programs. Unix style systems expect all programs to be single sections. 766@end itemize 767 768@node i386-Chars 769@subsection Special Characters 770 771@cindex line comment character, i386 772@cindex i386 line comment character 773The presence of a @samp{#} appearing anywhere on a line indicates the 774start of a comment that extends to the end of that line. 775 776If a @samp{#} appears as the first character of a line then the whole 777line is treated as a comment, but in this case the line can also be a 778logical line number directive (@pxref{Comments}) or a preprocessor 779control command (@pxref{Preprocessing}). 780 781If the @option{--divide} command-line option has not been specified 782then the @samp{/} character appearing anywhere on a line also 783introduces a line comment. 784 785@cindex line separator, i386 786@cindex statement separator, i386 787@cindex i386 line separator 788The @samp{;} character can be used to separate statements on the same 789line. 790 791@node i386-Mnemonics 792@section i386-Mnemonics 793@subsection Instruction Naming 794 795@cindex i386 instruction naming 796@cindex instruction naming, i386 797@cindex x86-64 instruction naming 798@cindex instruction naming, x86-64 799 800Instruction mnemonics are suffixed with one character modifiers which 801specify the size of operands. The letters @samp{b}, @samp{w}, @samp{l} 802and @samp{q} specify byte, word, long and quadruple word operands. If 803no suffix is specified by an instruction then @code{@value{AS}} tries to 804fill in the missing suffix based on the destination register operand 805(the last one by convention). Thus, @samp{mov %ax, %bx} is equivalent 806to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to 807@samp{movw $1, bx}. Note that this is incompatible with the AT&T Unix 808assembler which assumes that a missing mnemonic suffix implies long 809operand size. (This incompatibility does not affect compiler output 810since compilers always explicitly specify the mnemonic suffix.) 811 812When there is no sizing suffix and no (suitable) register operands to 813deduce the size of memory operands, with a few exceptions and where long 814operand size is possible in the first place, operand size will default 815to long in 32- and 64-bit modes. Similarly it will default to short in 81616-bit mode. Noteworthy exceptions are 817 818@itemize @bullet 819@item 820Instructions with an implicit on-stack operand as well as branches, 821which default to quad in 64-bit mode. 822 823@item 824Sign- and zero-extending moves, which default to byte size source 825operands. 826 827@item 828Floating point insns with integer operands, which default to short (for 829perhaps historical reasons). 830 831@item 832CRC32 with a 64-bit destination, which defaults to a quad source 833operand. 834 835@end itemize 836 837@cindex encoding options, i386 838@cindex encoding options, x86-64 839 840Different encoding options can be specified via pseudo prefixes: 841 842@itemize @bullet 843@item 844@samp{@{disp8@}} -- prefer 8-bit displacement. 845 846@item 847@samp{@{disp32@}} -- prefer 32-bit displacement. 848 849@item 850@samp{@{disp16@}} -- prefer 16-bit displacement. 851 852@item 853@samp{@{load@}} -- prefer load-form instruction. 854 855@item 856@samp{@{store@}} -- prefer store-form instruction. 857 858@item 859@samp{@{vex@}} -- encode with VEX prefix. 860 861@item 862@samp{@{vex3@}} -- encode with 3-byte VEX prefix. 863 864@item 865@samp{@{evex@}} -- encode with EVEX prefix. 866 867@item 868@samp{@{rex@}} -- prefer REX prefix for integer and legacy vector 869instructions (x86-64 only). Note that this differs from the @samp{rex} 870prefix which generates REX prefix unconditionally. 871 872@item 873@samp{@{nooptimize@}} -- disable instruction size optimization. 874@end itemize 875 876Mnemonics of Intel VNNI instructions are encoded with the EVEX prefix 877by default. The pseudo @samp{@{vex@}} prefix can be used to encode 878mnemonics of Intel VNNI instructions with the VEX prefix. 879 880@cindex conversion instructions, i386 881@cindex i386 conversion instructions 882@cindex conversion instructions, x86-64 883@cindex x86-64 conversion instructions 884The Intel-syntax conversion instructions 885 886@itemize @bullet 887@item 888@samp{cbw} --- sign-extend byte in @samp{%al} to word in @samp{%ax}, 889 890@item 891@samp{cwde} --- sign-extend word in @samp{%ax} to long in @samp{%eax}, 892 893@item 894@samp{cwd} --- sign-extend word in @samp{%ax} to long in @samp{%dx:%ax}, 895 896@item 897@samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax}, 898 899@item 900@samp{cdqe} --- sign-extend dword in @samp{%eax} to quad in @samp{%rax} 901(x86-64 only), 902 903@item 904@samp{cqo} --- sign-extend quad in @samp{%rax} to octuple in 905@samp{%rdx:%rax} (x86-64 only), 906@end itemize 907 908@noindent 909are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, @samp{cltd}, @samp{cltq}, and 910@samp{cqto} in AT&T naming. @code{@value{AS}} accepts either naming for these 911instructions. 912 913@cindex extension instructions, i386 914@cindex i386 extension instructions 915@cindex extension instructions, x86-64 916@cindex x86-64 extension instructions 917The Intel-syntax extension instructions 918 919@itemize @bullet 920@item 921@samp{movsx} --- sign-extend @samp{reg8/mem8} to @samp{reg16}. 922 923@item 924@samp{movsx} --- sign-extend @samp{reg8/mem8} to @samp{reg32}. 925 926@item 927@samp{movsx} --- sign-extend @samp{reg8/mem8} to @samp{reg64} 928(x86-64 only). 929 930@item 931@samp{movsx} --- sign-extend @samp{reg16/mem16} to @samp{reg32} 932 933@item 934@samp{movsx} --- sign-extend @samp{reg16/mem16} to @samp{reg64} 935(x86-64 only). 936 937@item 938@samp{movsxd} --- sign-extend @samp{reg32/mem32} to @samp{reg64} 939(x86-64 only). 940 941@item 942@samp{movzx} --- zero-extend @samp{reg8/mem8} to @samp{reg16}. 943 944@item 945@samp{movzx} --- zero-extend @samp{reg8/mem8} to @samp{reg32}. 946 947@item 948@samp{movzx} --- zero-extend @samp{reg8/mem8} to @samp{reg64} 949(x86-64 only). 950 951@item 952@samp{movzx} --- zero-extend @samp{reg16/mem16} to @samp{reg32} 953 954@item 955@samp{movzx} --- zero-extend @samp{reg16/mem16} to @samp{reg64} 956(x86-64 only). 957@end itemize 958 959@noindent 960are called @samp{movsbw/movsxb/movsx}, @samp{movsbl/movsxb/movsx}, 961@samp{movsbq/movsxb/movsx}, @samp{movswl/movsxw}, @samp{movswq/movsxw}, 962@samp{movslq/movsxl}, @samp{movzbw/movzxb/movzx}, 963@samp{movzbl/movzxb/movzx}, @samp{movzbq/movzxb/movzx}, 964@samp{movzwl/movzxw} and @samp{movzwq/movzxw} in AT&T syntax. 965 966@cindex jump instructions, i386 967@cindex call instructions, i386 968@cindex jump instructions, x86-64 969@cindex call instructions, x86-64 970Far call/jump instructions are @samp{lcall} and @samp{ljmp} in 971AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel 972convention. 973 974@subsection AT&T Mnemonic versus Intel Mnemonic 975 976@cindex i386 mnemonic compatibility 977@cindex mnemonic compatibility, i386 978 979@code{@value{AS}} supports assembly using Intel mnemonic. 980@code{.intel_mnemonic} selects Intel mnemonic with Intel syntax, and 981@code{.att_mnemonic} switches back to the usual AT&T mnemonic with AT&T 982syntax for compatibility with the output of @code{@value{GCC}}. 983Several x87 instructions, @samp{fadd}, @samp{fdiv}, @samp{fdivp}, 984@samp{fdivr}, @samp{fdivrp}, @samp{fmul}, @samp{fsub}, @samp{fsubp}, 985@samp{fsubr} and @samp{fsubrp}, are implemented in AT&T System V/386 986assembler with different mnemonics from those in Intel IA32 specification. 987@code{@value{GCC}} generates those instructions with AT&T mnemonic. 988 989@itemize @bullet 990@item @samp{movslq} with AT&T mnemonic only accepts 64-bit destination 991register. @samp{movsxd} should be used to encode 16-bit or 32-bit 992destination register with both AT&T and Intel mnemonics. 993@end itemize 994 995@node i386-Regs 996@section Register Naming 997 998@cindex i386 registers 999@cindex registers, i386 1000@cindex x86-64 registers 1001@cindex registers, x86-64 1002Register operands are always prefixed with @samp{%}. The 80386 registers 1003consist of 1004 1005@itemize @bullet 1006@item 1007the 8 32-bit registers @samp{%eax} (the accumulator), @samp{%ebx}, 1008@samp{%ecx}, @samp{%edx}, @samp{%edi}, @samp{%esi}, @samp{%ebp} (the 1009frame pointer), and @samp{%esp} (the stack pointer). 1010 1011@item 1012the 8 16-bit low-ends of these: @samp{%ax}, @samp{%bx}, @samp{%cx}, 1013@samp{%dx}, @samp{%di}, @samp{%si}, @samp{%bp}, and @samp{%sp}. 1014 1015@item 1016the 8 8-bit registers: @samp{%ah}, @samp{%al}, @samp{%bh}, 1017@samp{%bl}, @samp{%ch}, @samp{%cl}, @samp{%dh}, and @samp{%dl} (These 1018are the high-bytes and low-bytes of @samp{%ax}, @samp{%bx}, 1019@samp{%cx}, and @samp{%dx}) 1020 1021@item 1022the 6 section registers @samp{%cs} (code section), @samp{%ds} 1023(data section), @samp{%ss} (stack section), @samp{%es}, @samp{%fs}, 1024and @samp{%gs}. 1025 1026@item 1027the 5 processor control registers @samp{%cr0}, @samp{%cr2}, 1028@samp{%cr3}, @samp{%cr4}, and @samp{%cr8}. 1029 1030@item 1031the 6 debug registers @samp{%db0}, @samp{%db1}, @samp{%db2}, 1032@samp{%db3}, @samp{%db6}, and @samp{%db7}. 1033 1034@item 1035the 2 test registers @samp{%tr6} and @samp{%tr7}. 1036 1037@item 1038the 8 floating point register stack @samp{%st} or equivalently 1039@samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)}, 1040@samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}. 1041These registers are overloaded by 8 MMX registers @samp{%mm0}, 1042@samp{%mm1}, @samp{%mm2}, @samp{%mm3}, @samp{%mm4}, @samp{%mm5}, 1043@samp{%mm6} and @samp{%mm7}. 1044 1045@item 1046the 8 128-bit SSE registers registers @samp{%xmm0}, @samp{%xmm1}, @samp{%xmm2}, 1047@samp{%xmm3}, @samp{%xmm4}, @samp{%xmm5}, @samp{%xmm6} and @samp{%xmm7}. 1048@end itemize 1049 1050The AMD x86-64 architecture extends the register set by: 1051 1052@itemize @bullet 1053@item 1054enhancing the 8 32-bit registers to 64-bit: @samp{%rax} (the 1055accumulator), @samp{%rbx}, @samp{%rcx}, @samp{%rdx}, @samp{%rdi}, 1056@samp{%rsi}, @samp{%rbp} (the frame pointer), @samp{%rsp} (the stack 1057pointer) 1058 1059@item 1060the 8 extended registers @samp{%r8}--@samp{%r15}. 1061 1062@item 1063the 8 32-bit low ends of the extended registers: @samp{%r8d}--@samp{%r15d}. 1064 1065@item 1066the 8 16-bit low ends of the extended registers: @samp{%r8w}--@samp{%r15w}. 1067 1068@item 1069the 8 8-bit low ends of the extended registers: @samp{%r8b}--@samp{%r15b}. 1070 1071@item 1072the 4 8-bit registers: @samp{%sil}, @samp{%dil}, @samp{%bpl}, @samp{%spl}. 1073 1074@item 1075the 8 debug registers: @samp{%db8}--@samp{%db15}. 1076 1077@item 1078the 8 128-bit SSE registers: @samp{%xmm8}--@samp{%xmm15}. 1079@end itemize 1080 1081With the AVX extensions more registers were made available: 1082 1083@itemize @bullet 1084 1085@item 1086the 16 256-bit SSE @samp{%ymm0}--@samp{%ymm15} (only the first 8 1087available in 32-bit mode). The bottom 128 bits are overlaid with the 1088@samp{xmm0}--@samp{xmm15} registers. 1089 1090@end itemize 1091 1092The AVX512 extensions added the following registers: 1093 1094@itemize @bullet 1095 1096@item 1097the 32 512-bit registers @samp{%zmm0}--@samp{%zmm31} (only the first 8 1098available in 32-bit mode). The bottom 128 bits are overlaid with the 1099@samp{%xmm0}--@samp{%xmm31} registers and the first 256 bits are 1100overlaid with the @samp{%ymm0}--@samp{%ymm31} registers. 1101 1102@item 1103the 8 mask registers @samp{%k0}--@samp{%k7}. 1104 1105@end itemize 1106 1107@node i386-Prefixes 1108@section Instruction Prefixes 1109 1110@cindex i386 instruction prefixes 1111@cindex instruction prefixes, i386 1112@cindex prefixes, i386 1113Instruction prefixes are used to modify the following instruction. They 1114are used to repeat string instructions, to provide section overrides, to 1115perform bus lock operations, and to change operand and address sizes. 1116(Most instructions that normally operate on 32-bit operands will use 111716-bit operands if the instruction has an ``operand size'' prefix.) 1118Instruction prefixes are best written on the same line as the instruction 1119they act upon. For example, the @samp{scas} (scan string) instruction is 1120repeated with: 1121 1122@smallexample 1123 repne scas %es:(%edi),%al 1124@end smallexample 1125 1126You may also place prefixes on the lines immediately preceding the 1127instruction, but this circumvents checks that @code{@value{AS}} does 1128with prefixes, and will not work with all prefixes. 1129 1130Here is a list of instruction prefixes: 1131 1132@cindex section override prefixes, i386 1133@itemize @bullet 1134@item 1135Section override prefixes @samp{cs}, @samp{ds}, @samp{ss}, @samp{es}, 1136@samp{fs}, @samp{gs}. These are automatically added by specifying 1137using the @var{section}:@var{memory-operand} form for memory references. 1138 1139@cindex size prefixes, i386 1140@item 1141Operand/Address size prefixes @samp{data16} and @samp{addr16} 1142change 32-bit operands/addresses into 16-bit operands/addresses, 1143while @samp{data32} and @samp{addr32} change 16-bit ones (in a 1144@code{.code16} section) into 32-bit operands/addresses. These prefixes 1145@emph{must} appear on the same line of code as the instruction they 1146modify. For example, in a 16-bit @code{.code16} section, you might 1147write: 1148 1149@smallexample 1150 addr32 jmpl *(%ebx) 1151@end smallexample 1152 1153@cindex bus lock prefixes, i386 1154@cindex inhibiting interrupts, i386 1155@item 1156The bus lock prefix @samp{lock} inhibits interrupts during execution of 1157the instruction it precedes. (This is only valid with certain 1158instructions; see a 80386 manual for details). 1159 1160@cindex coprocessor wait, i386 1161@item 1162The wait for coprocessor prefix @samp{wait} waits for the coprocessor to 1163complete the current instruction. This should never be needed for the 116480386/80387 combination. 1165 1166@cindex repeat prefixes, i386 1167@item 1168The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added 1169to string instructions to make them repeat @samp{%ecx} times (@samp{%cx} 1170times if the current address size is 16-bits). 1171@cindex REX prefixes, i386 1172@item 1173The @samp{rex} family of prefixes is used by x86-64 to encode 1174extensions to i386 instruction set. The @samp{rex} prefix has four 1175bits --- an operand size overwrite (@code{64}) used to change operand size 1176from 32-bit to 64-bit and X, Y and Z extensions bits used to extend the 1177register set. 1178 1179You may write the @samp{rex} prefixes directly. The @samp{rex64xyz} 1180instruction emits @samp{rex} prefix with all the bits set. By omitting 1181the @code{64}, @code{x}, @code{y} or @code{z} you may write other 1182prefixes as well. Normally, there is no need to write the prefixes 1183explicitly, since gas will automatically generate them based on the 1184instruction operands. 1185@end itemize 1186 1187@node i386-Memory 1188@section Memory References 1189 1190@cindex i386 memory references 1191@cindex memory references, i386 1192@cindex x86-64 memory references 1193@cindex memory references, x86-64 1194An Intel syntax indirect memory reference of the form 1195 1196@smallexample 1197@var{section}:[@var{base} + @var{index}*@var{scale} + @var{disp}] 1198@end smallexample 1199 1200@noindent 1201is translated into the AT&T syntax 1202 1203@smallexample 1204@var{section}:@var{disp}(@var{base}, @var{index}, @var{scale}) 1205@end smallexample 1206 1207@noindent 1208where @var{base} and @var{index} are the optional 32-bit base and 1209index registers, @var{disp} is the optional displacement, and 1210@var{scale}, taking the values 1, 2, 4, and 8, multiplies @var{index} 1211to calculate the address of the operand. If no @var{scale} is 1212specified, @var{scale} is taken to be 1. @var{section} specifies the 1213optional section register for the memory operand, and may override the 1214default section register (see a 80386 manual for section register 1215defaults). Note that section overrides in AT&T syntax @emph{must} 1216be preceded by a @samp{%}. If you specify a section override which 1217coincides with the default section register, @code{@value{AS}} does @emph{not} 1218output any section register override prefixes to assemble the given 1219instruction. Thus, section overrides can be specified to emphasize which 1220section register is used for a given memory operand. 1221 1222Here are some examples of Intel and AT&T style memory references: 1223 1224@table @asis 1225@item AT&T: @samp{-4(%ebp)}, Intel: @samp{[ebp - 4]} 1226@var{base} is @samp{%ebp}; @var{disp} is @samp{-4}. @var{section} is 1227missing, and the default section is used (@samp{%ss} for addressing with 1228@samp{%ebp} as the base register). @var{index}, @var{scale} are both missing. 1229 1230@item AT&T: @samp{foo(,%eax,4)}, Intel: @samp{[foo + eax*4]} 1231@var{index} is @samp{%eax} (scaled by a @var{scale} 4); @var{disp} is 1232@samp{foo}. All other fields are missing. The section register here 1233defaults to @samp{%ds}. 1234 1235@item AT&T: @samp{foo(,1)}; Intel @samp{[foo]} 1236This uses the value pointed to by @samp{foo} as a memory operand. 1237Note that @var{base} and @var{index} are both missing, but there is only 1238@emph{one} @samp{,}. This is a syntactic exception. 1239 1240@item AT&T: @samp{%gs:foo}; Intel @samp{gs:foo} 1241This selects the contents of the variable @samp{foo} with section 1242register @var{section} being @samp{%gs}. 1243@end table 1244 1245Absolute (as opposed to PC relative) call and jump operands must be 1246prefixed with @samp{*}. If no @samp{*} is specified, @code{@value{AS}} 1247always chooses PC relative addressing for jump/call labels. 1248 1249Any instruction that has a memory operand, but no register operand, 1250@emph{must} specify its size (byte, word, long, or quadruple) with an 1251instruction mnemonic suffix (@samp{b}, @samp{w}, @samp{l} or @samp{q}, 1252respectively). 1253 1254The x86-64 architecture adds an RIP (instruction pointer relative) 1255addressing. This addressing mode is specified by using @samp{rip} as a 1256base register. Only constant offsets are valid. For example: 1257 1258@table @asis 1259@item AT&T: @samp{1234(%rip)}, Intel: @samp{[rip + 1234]} 1260Points to the address 1234 bytes past the end of the current 1261instruction. 1262 1263@item AT&T: @samp{symbol(%rip)}, Intel: @samp{[rip + symbol]} 1264Points to the @code{symbol} in RIP relative way, this is shorter than 1265the default absolute addressing. 1266@end table 1267 1268Other addressing modes remain unchanged in x86-64 architecture, except 1269registers used are 64-bit instead of 32-bit. 1270 1271@node i386-Jumps 1272@section Handling of Jump Instructions 1273 1274@cindex jump optimization, i386 1275@cindex i386 jump optimization 1276@cindex jump optimization, x86-64 1277@cindex x86-64 jump optimization 1278Jump instructions are always optimized to use the smallest possible 1279displacements. This is accomplished by using byte (8-bit) displacement 1280jumps whenever the target is sufficiently close. If a byte displacement 1281is insufficient a long displacement is used. We do not support 1282word (16-bit) displacement jumps in 32-bit mode (i.e. prefixing the jump 1283instruction with the @samp{data16} instruction prefix), since the 80386 1284insists upon masking @samp{%eip} to 16 bits after the word displacement 1285is added. (See also @pxref{i386-Arch}) 1286 1287Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz}, 1288@samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in byte 1289displacements, so that if you use these instructions (@code{@value{GCC}} does 1290not use them) you may get an error message (and incorrect code). The AT&T 129180386 assembler tries to get around this problem by expanding @samp{jcxz foo} 1292to 1293 1294@smallexample 1295 jcxz cx_zero 1296 jmp cx_nonzero 1297cx_zero: jmp foo 1298cx_nonzero: 1299@end smallexample 1300 1301@node i386-Float 1302@section Floating Point 1303 1304@cindex i386 floating point 1305@cindex floating point, i386 1306@cindex x86-64 floating point 1307@cindex floating point, x86-64 1308All 80387 floating point types except packed BCD are supported. 1309(BCD support may be added without much difficulty). These data 1310types are 16-, 32-, and 64- bit integers, and single (32-bit), 1311double (64-bit), and extended (80-bit) precision floating point. 1312Each supported type has an instruction mnemonic suffix and a constructor 1313associated with it. Instruction mnemonic suffixes specify the operand's 1314data type. Constructors build these data types into memory. 1315 1316@cindex @code{float} directive, i386 1317@cindex @code{single} directive, i386 1318@cindex @code{double} directive, i386 1319@cindex @code{tfloat} directive, i386 1320@cindex @code{hfloat} directive, i386 1321@cindex @code{bfloat16} directive, i386 1322@cindex @code{float} directive, x86-64 1323@cindex @code{single} directive, x86-64 1324@cindex @code{double} directive, x86-64 1325@cindex @code{tfloat} directive, x86-64 1326@cindex @code{hfloat} directive, x86-64 1327@cindex @code{bfloat16} directive, x86-64 1328@itemize @bullet 1329@item 1330Floating point constructors are @samp{.float} or @samp{.single}, 1331@samp{.double}, @samp{.tfloat}, @samp{.hfloat}, and @samp{.bfloat16} for 32-, 133264-, 80-, and 16-bit (two flavors) formats respectively. The former three 1333correspond to instruction mnemonic suffixes @samp{s}, @samp{l}, and @samp{t}. 1334@samp{t} stands for 80-bit (ten byte) real. The 80387 only supports this 1335format via the @samp{fldt} (load 80-bit real to stack top) and @samp{fstpt} 1336(store 80-bit real and pop stack) instructions. 1337 1338@cindex @code{word} directive, i386 1339@cindex @code{long} directive, i386 1340@cindex @code{int} directive, i386 1341@cindex @code{quad} directive, i386 1342@cindex @code{word} directive, x86-64 1343@cindex @code{long} directive, x86-64 1344@cindex @code{int} directive, x86-64 1345@cindex @code{quad} directive, x86-64 1346@item 1347Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and 1348@samp{.quad} for the 16-, 32-, and 64-bit integer formats. The 1349corresponding instruction mnemonic suffixes are @samp{s} (short), 1350@samp{l} (long), and @samp{q} (quad). As with the 80-bit real format, 1351the 64-bit @samp{q} format is only present in the @samp{fildq} (load 1352quad integer to stack top) and @samp{fistpq} (store quad integer and pop 1353stack) instructions. 1354@end itemize 1355 1356Register to register operations should not use instruction mnemonic suffixes. 1357@samp{fstl %st, %st(1)} will give a warning, and be assembled as if you 1358wrote @samp{fst %st, %st(1)}, since all register to register operations 1359use 80-bit floating point operands. (Contrast this with @samp{fstl %st, mem}, 1360which converts @samp{%st} from 80-bit to 64-bit floating point format, 1361then stores the result in the 4 byte location @samp{mem}) 1362 1363@node i386-SIMD 1364@section Intel's MMX and AMD's 3DNow! SIMD Operations 1365 1366@cindex MMX, i386 1367@cindex 3DNow!, i386 1368@cindex SIMD, i386 1369@cindex MMX, x86-64 1370@cindex 3DNow!, x86-64 1371@cindex SIMD, x86-64 1372 1373@code{@value{AS}} supports Intel's MMX instruction set (SIMD 1374instructions for integer data), available on Intel's Pentium MMX 1375processors and Pentium II processors, AMD's K6 and K6-2 processors, 1376Cyrix' M2 processor, and probably others. It also supports AMD's 3DNow!@: 1377instruction set (SIMD instructions for 32-bit floating point data) 1378available on AMD's K6-2 processor and possibly others in the future. 1379 1380Currently, @code{@value{AS}} does not support Intel's floating point 1381SIMD, Katmai (KNI). 1382 1383The eight 64-bit MMX operands, also used by 3DNow!, are called @samp{%mm0}, 1384@samp{%mm1}, ... @samp{%mm7}. They contain eight 8-bit integers, four 138516-bit integers, two 32-bit integers, one 64-bit integer, or two 32-bit 1386floating point values. The MMX registers cannot be used at the same time 1387as the floating point stack. 1388 1389See Intel and AMD documentation, keeping in mind that the operand order in 1390instructions is reversed from the Intel syntax. 1391 1392@node i386-LWP 1393@section AMD's Lightweight Profiling Instructions 1394 1395@cindex LWP, i386 1396@cindex LWP, x86-64 1397 1398@code{@value{AS}} supports AMD's Lightweight Profiling (LWP) 1399instruction set, available on AMD's Family 15h (Orochi) processors. 1400 1401LWP enables applications to collect and manage performance data, and 1402react to performance events. The collection of performance data 1403requires no context switches. LWP runs in the context of a thread and 1404so several counters can be used independently across multiple threads. 1405LWP can be used in both 64-bit and legacy 32-bit modes. 1406 1407For detailed information on the LWP instruction set, see the 1408@cite{AMD Lightweight Profiling Specification} available at 1409@uref{http://developer.amd.com/cpu/LWP,Lightweight Profiling Specification}. 1410 1411@node i386-BMI 1412@section Bit Manipulation Instructions 1413 1414@cindex BMI, i386 1415@cindex BMI, x86-64 1416 1417@code{@value{AS}} supports the Bit Manipulation (BMI) instruction set. 1418 1419BMI instructions provide several instructions implementing individual 1420bit manipulation operations such as isolation, masking, setting, or 1421resetting. 1422 1423@c Need to add a specification citation here when available. 1424 1425@node i386-TBM 1426@section AMD's Trailing Bit Manipulation Instructions 1427 1428@cindex TBM, i386 1429@cindex TBM, x86-64 1430 1431@code{@value{AS}} supports AMD's Trailing Bit Manipulation (TBM) 1432instruction set, available on AMD's BDVER2 processors (Trinity and 1433Viperfish). 1434 1435TBM instructions provide instructions implementing individual bit 1436manipulation operations such as isolating, masking, setting, resetting, 1437complementing, and operations on trailing zeros and ones. 1438 1439@c Need to add a specification citation here when available. 1440 1441@node i386-16bit 1442@section Writing 16-bit Code 1443 1444@cindex i386 16-bit code 1445@cindex 16-bit code, i386 1446@cindex real-mode code, i386 1447@cindex @code{code16gcc} directive, i386 1448@cindex @code{code16} directive, i386 1449@cindex @code{code32} directive, i386 1450@cindex @code{code64} directive, i386 1451@cindex @code{code64} directive, x86-64 1452While @code{@value{AS}} normally writes only ``pure'' 32-bit i386 code 1453or 64-bit x86-64 code depending on the default configuration, 1454it also supports writing code to run in real mode or in 16-bit protected 1455mode code segments. To do this, put a @samp{.code16} or 1456@samp{.code16gcc} directive before the assembly language instructions to 1457be run in 16-bit mode. You can switch @code{@value{AS}} to writing 145832-bit code with the @samp{.code32} directive or 64-bit code with the 1459@samp{.code64} directive. 1460 1461@samp{.code16gcc} provides experimental support for generating 16-bit 1462code from gcc, and differs from @samp{.code16} in that @samp{call}, 1463@samp{ret}, @samp{enter}, @samp{leave}, @samp{push}, @samp{pop}, 1464@samp{pusha}, @samp{popa}, @samp{pushf}, and @samp{popf} instructions 1465default to 32-bit size. This is so that the stack pointer is 1466manipulated in the same way over function calls, allowing access to 1467function parameters at the same stack offsets as in 32-bit mode. 1468@samp{.code16gcc} also automatically adds address size prefixes where 1469necessary to use the 32-bit addressing modes that gcc generates. 1470 1471The code which @code{@value{AS}} generates in 16-bit mode will not 1472necessarily run on a 16-bit pre-80386 processor. To write code that 1473runs on such a processor, you must refrain from using @emph{any} 32-bit 1474constructs which require @code{@value{AS}} to output address or operand 1475size prefixes. 1476 1477Note that writing 16-bit code instructions by explicitly specifying a 1478prefix or an instruction mnemonic suffix within a 32-bit code section 1479generates different machine instructions than those generated for a 148016-bit code segment. In a 32-bit code section, the following code 1481generates the machine opcode bytes @samp{66 6a 04}, which pushes the 1482value @samp{4} onto the stack, decrementing @samp{%esp} by 2. 1483 1484@smallexample 1485 pushw $4 1486@end smallexample 1487 1488The same code in a 16-bit code section would generate the machine 1489opcode bytes @samp{6a 04} (i.e., without the operand size prefix), which 1490is correct since the processor default operand size is assumed to be 16 1491bits in a 16-bit code section. 1492 1493@node i386-Arch 1494@section Specifying CPU Architecture 1495 1496@cindex arch directive, i386 1497@cindex i386 arch directive 1498@cindex arch directive, x86-64 1499@cindex x86-64 arch directive 1500 1501@code{@value{AS}} may be told to assemble for a particular CPU 1502(sub-)architecture with the @code{.arch @var{cpu_type}} directive. This 1503directive enables a warning when gas detects an instruction that is not 1504supported on the CPU specified. The choices for @var{cpu_type} are: 1505 1506@multitable @columnfractions .20 .20 .20 .20 1507@item @samp{default} @tab @samp{push} @tab @samp{pop} 1508@item @samp{i8086} @tab @samp{i186} @tab @samp{i286} @tab @samp{i386} 1509@item @samp{i486} @tab @samp{i586} @tab @samp{i686} @tab @samp{pentium} 1510@item @samp{pentiumpro} @tab @samp{pentiumii} @tab @samp{pentiumiii} @tab @samp{pentium4} 1511@item @samp{prescott} @tab @samp{nocona} @tab @samp{core} @tab @samp{core2} 1512@item @samp{corei7} @tab @samp{iamcu} 1513@item @samp{k6} @tab @samp{k6_2} @tab @samp{athlon} @tab @samp{k8} 1514@item @samp{amdfam10} @tab @samp{bdver1} @tab @samp{bdver2} @tab @samp{bdver3} 1515@item @samp{bdver4} @tab @samp{znver1} @tab @samp{znver2} @tab @samp{znver3} 1516@item @samp{btver1} @tab @samp{btver2} @tab @samp{generic32} @tab @samp{generic64} 1517@item @samp{.cmov} @tab @samp{.fxsr} @tab @samp{.mmx} 1518@item @samp{.sse} @tab @samp{.sse2} @tab @samp{.sse3} @tab @samp{.sse4a} 1519@item @samp{.ssse3} @tab @samp{.sse4.1} @tab @samp{.sse4.2} @tab @samp{.sse4} 1520@item @samp{.avx} @tab @samp{.vmx} @tab @samp{.smx} @tab @samp{.ept} 1521@item @samp{.clflush} @tab @samp{.movbe} @tab @samp{.xsave} @tab @samp{.xsaveopt} 1522@item @samp{.aes} @tab @samp{.pclmul} @tab @samp{.fma} @tab @samp{.fsgsbase} 1523@item @samp{.rdrnd} @tab @samp{.f16c} @tab @samp{.avx2} @tab @samp{.bmi2} 1524@item @samp{.lzcnt} @tab @samp{.popcnt} @tab @samp{.invpcid} @tab @samp{.vmfunc} 1525@item @samp{.hle} 1526@item @samp{.rtm} @tab @samp{.adx} @tab @samp{.rdseed} @tab @samp{.prfchw} 1527@item @samp{.smap} @tab @samp{.mpx} @tab @samp{.sha} @tab @samp{.prefetchwt1} 1528@item @samp{.clflushopt} @tab @samp{.xsavec} @tab @samp{.xsaves} @tab @samp{.se1} 1529@item @samp{.avx512f} @tab @samp{.avx512cd} @tab @samp{.avx512er} @tab @samp{.avx512pf} 1530@item @samp{.avx512vl} @tab @samp{.avx512bw} @tab @samp{.avx512dq} @tab @samp{.avx512ifma} 1531@item @samp{.avx512vbmi} @tab @samp{.avx512_4fmaps} @tab @samp{.avx512_4vnniw} 1532@item @samp{.avx512_vpopcntdq} @tab @samp{.avx512_vbmi2} @tab @samp{.avx512_vnni} 1533@item @samp{.avx512_bitalg} @tab @samp{.avx512_bf16} @tab @samp{.avx512_vp2intersect} 1534@item @samp{.tdx} @tab @samp{.avx_vnni} @tab @samp{.avx512_fp16} 1535@item @samp{.clwb} @tab @samp{.rdpid} @tab @samp{.ptwrite} @tab @samp{.ibt} 1536@item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote} 1537@item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq} 1538@item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk} 1539@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile} 1540@item @samp{.kl} @tab @samp{.widekl} @tab @samp{.uintr} @tab @samp{.hreset} 1541@item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5} 1542@item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme} 1543@item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16} 1544@item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru} 1545@item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb} 1546@item @samp{.tlbsync} 1547@end multitable 1548 1549Apart from the warning, there are only two other effects on 1550@code{@value{AS}} operation; Firstly, if you specify a CPU other than 1551@samp{i486}, then shift by one instructions such as @samp{sarl $1, %eax} 1552will automatically use a two byte opcode sequence. The larger three 1553byte opcode sequence is used on the 486 (and when no architecture is 1554specified) because it executes faster on the 486. Note that you can 1555explicitly request the two byte opcode by writing @samp{sarl %eax}. 1556Secondly, if you specify @samp{i8086}, @samp{i186}, or @samp{i286}, 1557@emph{and} @samp{.code16} or @samp{.code16gcc} then byte offset 1558conditional jumps will be promoted when necessary to a two instruction 1559sequence consisting of a conditional jump of the opposite sense around 1560an unconditional jump to the target. 1561 1562Following the CPU architecture (but not a sub-architecture, which are those 1563starting with a dot), you may specify @samp{jumps} or @samp{nojumps} to 1564control automatic promotion of conditional jumps. @samp{jumps} is the 1565default, and enables jump promotion; All external jumps will be of the long 1566variety, and file-local jumps will be promoted as necessary. 1567(@pxref{i386-Jumps}) @samp{nojumps} leaves external conditional jumps as 1568byte offset jumps, and warns about file-local conditional jumps that 1569@code{@value{AS}} promotes. 1570Unconditional jumps are treated as for @samp{jumps}. 1571 1572For example 1573 1574@smallexample 1575 .arch i8086,nojumps 1576@end smallexample 1577 1578@node i386-ISA 1579@section AMD64 ISA vs. Intel64 ISA 1580 1581There are some discrepancies between AMD64 and Intel64 ISAs. 1582 1583@itemize @bullet 1584@item For @samp{movsxd} with 16-bit destination register, AMD64 1585supports 32-bit source operand and Intel64 supports 16-bit source 1586operand. 1587 1588@item For far branches (with explicit memory operand), both ISAs support 158932- and 16-bit operand size. Intel64 additionally supports 64-bit 1590operand size, encoded as @samp{ljmpq} and @samp{lcallq} in AT&T syntax 1591and with an explicit @samp{tbyte ptr} operand size specifier in Intel 1592syntax. 1593 1594@item @samp{lfs}, @samp{lgs}, and @samp{lss} similarly allow for 16- 1595and 32-bit operand size (32- and 48-bit memory operand) in both ISAs, 1596while Intel64 additionally supports 64-bit operand sise (80-bit memory 1597operands). 1598 1599@end itemize 1600 1601@node i386-Bugs 1602@section AT&T Syntax bugs 1603 1604The UnixWare assembler, and probably other AT&T derived ix86 Unix 1605assemblers, generate floating point instructions with reversed source 1606and destination registers in certain cases. Unfortunately, gcc and 1607possibly many other programs use this reversed syntax, so we're stuck 1608with it. 1609 1610For example 1611 1612@smallexample 1613 fsub %st,%st(3) 1614@end smallexample 1615@noindent 1616results in @samp{%st(3)} being updated to @samp{%st - %st(3)} rather 1617than the expected @samp{%st(3) - %st}. This happens with all the 1618non-commutative arithmetic floating point operations with two register 1619operands where the source register is @samp{%st} and the destination 1620register is @samp{%st(i)}. 1621 1622@node i386-Notes 1623@section Notes 1624 1625@cindex i386 @code{mul}, @code{imul} instructions 1626@cindex @code{mul} instruction, i386 1627@cindex @code{imul} instruction, i386 1628@cindex @code{mul} instruction, x86-64 1629@cindex @code{imul} instruction, x86-64 1630There is some trickery concerning the @samp{mul} and @samp{imul} 1631instructions that deserves mention. The 16-, 32-, 64- and 128-bit expanding 1632multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5 1633for @samp{imul}) can be output only in the one operand form. Thus, 1634@samp{imul %ebx, %eax} does @emph{not} select the expanding multiply; 1635the expanding multiply would clobber the @samp{%edx} register, and this 1636would confuse @code{@value{GCC}} output. Use @samp{imul %ebx} to get the 163764-bit product in @samp{%edx:%eax}. 1638 1639We have added a two operand form of @samp{imul} when the first operand 1640is an immediate mode expression and the second operand is a register. 1641This is just a shorthand, so that, multiplying @samp{%eax} by 69, for 1642example, can be done with @samp{imul $69, %eax} rather than @samp{imul 1643$69, %eax, %eax}. 1644 1645