1 2<html> 3 4<head> 5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 6 <title>SLJIT tutorial</title> 7 8 <style type="text/css"> 9 body { 10 background-color: #707070; 11 color: #000000; 12 font-family: "garamond" 13 } 14 td.main { 15 background-color: #ffffff; 16 color: #000000; 17 font-family: "garamond" 18 } 19 </style> 20</head> 21 22<body> 23 24<center> 25<table width="760" cellspacing=0 cellpadding=0> 26<tr height=20><td width=20 class="main"></td><td width=720 class="main"></td><td width=20 class="main"></td></tr> 27<tr><td width=20 class="main"></td><td width=720 class="main"> 28 29<center> 30<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=248047&type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a> 31</center> 32<h1><center>SLJIT tutorial</center></h1> 33 34<h2>Before started</h2> 35 36<a href="">Download the tutorial sources</a><br> 37<br> 38SLJIT is a light-weight, platform independent JIT compiler, it's easy to 39embed to your own project, as a result of its 'stack-less', SLJIT have 40some limit to register usage.<br> 41<br> 42Here is some other JIT compiler I digged these days, place here if you have interest:<br> 43 44<ul> 45 <b>Libjit/liblighning:</b> - the backend of GNU.net<br> 46 <b>Libgccjit:</b> - introduced in GCC5.0, its different from other JIT lib, this 47 one seems like constructing a C code, it use the backend of GCC.<br> 48 <b>AsmJIT:</b> - branch from the famous V8 project (JavaScript engine in Chrome), 49 support only X86/X86_64.<br> 50 <b>DynASM:</b> - used in LuaJIT.<br> 51</ul> 52 53<br> 54AsmJIT and DynASM work in the instruction level, look like coding with ASM language, 55SLJIT look like ASM also, but it hide the detail of the specific CPU, make it more 56common, and become portable, libjit work on higher layer, libgccjit as I mention, 57really you are constructing the C code.<br> 58 59<h2>First program</h2> 60 61Usage of SLJIT: 62<ul> 631. #include "sljitLir.h" in the head of your C/C++ program<br> 642. Compile with sljit_src/sljitLir.c<br> 65</ul> 66 67ALL example can be compile like this: 68<ul> 69gcc -Wall -Ipath/to/sljit_src -DSLJIT_CONFIG_AUTO=1 \<br> 70 <ul><b>xxx.c</b> path/to/sljit_src/sljitLir.c -o program</ul> 71</ul> 72 73OK, let's take a look at the first program, this program we create a function that 74return the sum of 3 arguments.<br> 75<br> 76<div style='font-family:Courier New;font-size:11px'> 77<ul> 78#include "sljitLir.h"<br> 79 <br> 80#include <stdio.h><br> 81#include <stdlib.h><br> 82 <br> 83typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);<br> 84 <br> 85static int add3(sljit_sw a, sljit_sw b, sljit_sw c)<br> 86{<br> 87 <ul> 88 void *code;<br> 89 sljit_sw len;<br> 90 func3_t func;<br> 91 <br> 92 /* Create a SLJIT compiler */<br> 93 struct sljit_compiler *C = sljit_create_compiler();<br> 94 <br> 95 /* Start a context(function entry), have 3 arguments, discuss later */<br> 96 sljit_emit_enter(C, 0, 3, 1, 3, 0, 0, 0);<br> 97 <br> 98 /* The first arguments of function is register SLJIT_S0, 2nd, SLJIT_S1, etc. */<br> 99 /* R0 = first */<br> 100 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_S0, 0);<br> 101 <br> 102 /* R0 = R0 + second */<br> 103 sljit_emit_op2(C, SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_S1, 0);<br> 104 <br> 105 /* R0 = R0 + third */<br> 106 sljit_emit_op2(C, SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_S2, 0);<br> 107 <br> 108 /* This statement mov R0 to RETURN REG and return */<br> 109 /* in fact, R0 is RETURN REG itself */<br> 110 sljit_emit_return(C, SLJIT_MOV, SLJIT_R0, 0);<br> 111 <br> 112 /* Generate machine code */<br> 113 code = sljit_generate_code(C);<br> 114 len = sljit_get_generated_code_size(C);<br> 115 <br> 116 /* Execute code */<br> 117 func = (func3_t)code;<br> 118 printf("func return %ld\n", func(a, b, c));<br> 119 <br> 120 /* dump_code(code, len); */<br> 121 <br> 122 /* Clean up */<br> 123 sljit_free_compiler(C);<br> 124 sljit_free_code(code);<br> 125 return 0;<br> 126 </ul> 127}<br> 128 <br> 129int main()<br> 130{<br> 131 <ul> 132 return add3(4, 5, 6);<br> 133 </ul> 134}<br> 135</ul> 136</div> 137 138<br> 139The function sljit_emit_enter create a context, save some registers to the stack, 140and create a call-frame, sljit_emit_return restore the saved-register and clean-up 141the frame. SLJIT is design to embed into other application, the code it generated 142has to follow some basic rule.<br> 143<br> 144The standard called Application Binary Interface, or ABI for short, here is a 145document for X86_64 CPU (<a href="http://www.x86-64.org/documentation/abi.pdf">ABI.pdf</a>), 146almost all Linux/Unix follow this standard. MS windows has its own, read this for more: 147<a href="http://en.wikipedia.org/wiki/X86_calling_conventions">X86_calling_conventions</a><br> 148<br> 149When reading the doc of sljit_emit_emter, the parameters 'saveds' and 'scratchs' make 150me confused. The fact is, the registers in CPU has different functions in the ABI spec, 151some of them used to pass arguments, some of them are 'callee-saved', some of them are 152'temporary used', take X86_64 for example, RAX, R10, R11 are temporary used, that means, 153they may be changed after a call instruction. And RBX, R12-R15 are callee-saved, those 154will remain the same values after the call. The rule is, every function should save 155those registers before using it.<br> 156<br> 157Fortunately, SLJIT have done the most for us, SLJIT_S[0-9] represent those 'safe' 158registers, SLJIT_R[0-9] however, only for 'temporary used'.<br> 159<br> 160When a function start, SLJIT move the function arguments to S0, S1, S2 register, it 161means function arguments are always 'safe' in the context, the limit of using stack for 162storing arguments make SLJIT support only 3 arguments max.<br> 163<br> 164Sljit_emit_opX is easy to understand, in SLJIT a data value is represented by 2 165parameters, it can be a register, an In-memory data, or an immediate number.<br> 166<br> 167 168<table align="center" cellspacing="0"> 169<tr><td>First parameter</td> <td>Second parameter</td> <td>Meaning</td></tr> 170<tr><td>SLJIT_R*, SLJIT_S*</td> <td>0</td> <td>Temp/saved registers</td></tr> 171<tr><td>SLJIT_IMM</td> <td>Number</td> <td>Immediate number</td></tr> 172<tr><td>SLJIT_MEM</td> <td>Address</td> <td>In-mem data with Absolute address</td></tr> 173<tr><td>SLJIT_MEM1(r)</td> <td>Offset</td> <td>In-mem data in [R + offset]</td></tr> 174<tr><td>SLJIT_MEM2(r1, r2)</td> <td>Shift(size)</td> <td>In-mem array, R1 as base address, R2 as index, <br> 175 Shift as size(0 for bytes, 1 for shorts, 2 for <br> 176 4bytes, 3 for 8bytes)</td></tr> 177</table> 178 179<h2>Branch</h2> 180<div style='font-family:Courier New;font-size:11px'> 181<ul> 182#include "sljitLir.h"<br> 183 <br> 184#include <stdio.h><br> 185#include <stdlib.h><br> 186 <br> 187typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);<br> 188 <br> 189/*<br> 190 This example, we generate a function like this:<br> 191 <br> 192sljit_sw func(sljit_sw a, sljit_sw b, sljit_sw c)<br> 193{<br> 194 <ul> 195 if ((a & 1) == 0)<br> 196 <ul> 197 return c;<br> 198 </ul> 199 return b;<br> 200</ul> 201}<br> 202 <br> 203 */<br> 204static int branch(sljit_sw a, sljit_sw b, sljit_sw c)<br> 205{<br> 206 <ul> 207 void *code;<br> 208 sljit_uw len;<br> 209 func3_t func;<br> 210 <br> 211 struct sljit_jump *ret_c;<br> 212 struct sljit_jump *out;<br> 213 <br> 214 /* Create a SLJIT compiler */<br> 215 struct sljit_compiler *C = sljit_create_compiler();<br> 216 <br> 217 /* 3 arg, 1 temp reg, 3 save reg */<br> 218 sljit_emit_enter(C, 0, 3, 1, 3, 0, 0, 0);<br> 219 <br> 220 /* R0 = a & 1, S0 is argument a */<br> 221 sljit_emit_op2(C, SLJIT_AND, SLJIT_R0, 0, SLJIT_S0, 0, SLJIT_IMM, 1);<br> 222 <br> 223 /* if R0 == 0 then jump to ret_c, where is ret_c? we assign it later */<br> 224 ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);<br> 225 <br> 226 /* R0 = b, S1 is argument b */<br> 227 sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);<br> 228 <br> 229 /* jump to out */<br> 230 out = sljit_emit_jump(C, SLJIT_JUMP);<br> 231 <br> 232 /* here is the 'ret_c' should jump, we emit a label and set it to ret_c */<br> 233 sljit_set_label(ret_c, sljit_emit_label(C));<br> 234 <br> 235 /* R0 = c, S2 is argument c */<br> 236 sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S2, 0);<br> 237 <br> 238 /* here is the 'out' should jump */<br> 239 sljit_set_label(out, sljit_emit_label(C));<br> 240 <br> 241 /* end of function */<br> 242 sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);<br> 243 <br> 244 /* Generate machine code */<br> 245 code = sljit_generate_code(C);<br> 246 len = sljit_get_generated_code_size(C);<br> 247 <br> 248 /* Execute code */<br> 249 func = (func3_t)code;<br> 250 printf("func return %ld\n", func(a, b, c));<br> 251 <br> 252 /* dump_code(code, len); */<br> 253 <br> 254 /* Clean up */<br> 255 sljit_free_compiler(C);<br> 256 sljit_free_code(code);<br> 257 return 0;<br> 258</ul> 259}<br> 260 <br> 261int main()<br> 262{<br> 263<ul> 264 return branch(4, 5, 6);<br> 265</ul> 266}<br> 267</ul> 268</div> 269 270The key to implement branch is 'struct sljit_jump' and 'struct sljit_label', 271the 'jump' contain a jump instruction, it does not know where to jump unless 272you set a label to it, the 'label' is a code address just like label in ASM 273language.<br> 274<br> 275sljit_emit_cmp/sljit_emit_jump generate a conditional/unconditional jump, 276take the statement<br> 277<ul> 278ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);<br> 279</ul> 280For example, it create a jump instruction, the condition is R0 equals 0, and 281the position of jumping will assign later with the sljit_set_label statement.<br> 282<br> 283In this example, it creates a branch like this:<br> 284<ul> 285 <ul> 286 R0 = a & 1;<br> 287 if R0 == 0 then goto ret_c;<br> 288 R0 = b;<br> 289 goto out;<br> 290 </ul> 291ret_c:<br> 292 <ul> 293 R0 = c;<br> 294 </ul> 295out:<br> 296 <ul> 297 return R0;<br> 298 </ul> 299</ul> 300<br> 301This is how high-level-language compiler handle branch.<br> 302<br> 303 304<h2>Loop</h2> 305 306Loop example is similar with Branch. 307 308<div style='font-family:Courier New;font-size:11px'> 309<ul> 310/* 311 This example, we generate a function like this:<br> 312 <br> 313sljit_sw func(sljit_sw a, sljit_sw b)<br> 314{<br> 315<ul> 316 sljit_sw i;<br> 317 sljit_sw ret = 0;<br> 318 for (i = 0; i < a; ++i) {<br> 319 <ul> 320 ret += b;<br> 321 </ul> 322 }<br> 323 return ret;<br> 324</ul> 325}<br> 326*/<br> 327<br> 328<ul> 329 /* 2 arg, 2 temp reg, 2 saved reg */<br> 330 sljit_emit_enter(C, 0, 2, 2, 2, 0, 0, 0);<br> 331 <br> 332 /* R0 = 0 */<br> 333 sljit_emit_op2(C, SLJIT_XOR, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_R1, 0);<br> 334 /* RET = 0 */<br> 335 sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0);<br> 336 /* loopstart: */<br> 337 loopstart = sljit_emit_label(C);<br> 338 /* R1 >= a --> jump out */<br> 339 out = sljit_emit_cmp(C, SLJIT_GREATER_EQUAL, SLJIT_R1, 0, SLJIT_S0, 0);<br> 340 /* RET += b */<br> 341 sljit_emit_op2(C, SLJIT_ADD, SLJIT_RETURN_REG, 0, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);<br> 342 /* R1 += 1 */<br> 343 sljit_emit_op2(C, SLJIT_ADD, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 1);<br> 344 /* jump loopstart */<br> 345 sljit_set_label(sljit_emit_jump(C, SLJIT_JUMP), loopstart);<br> 346 /* out: */<br> 347 sljit_set_label(out, sljit_emit_label(C));<br> 348 <br> 349 /* return RET */<br> 350 sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);<br> 351</ul> 352</ul> 353</div> 354 355After this example, you are ready to construct any program that contain complex branch 356and loop.<br> 357<br> 358Here is an interesting fact, 'xor reg, reg' is better than 'mov reg, 0', it save 2 bytes 359in X86 machine.<br> 360<br> 361I will give only the key code in the rest of this tutorial, the full source of each 362chapter can be found in the attachment.<br> 363 364 365<h2>Call external function</h2> 366 367It's easy to call an external function in SLJIT, we use sljit_emit_ijump with SLJIT_CALL* 368operation to do so.<br> 369<br> 370SLJIT_CALL[N] is use to call a function with N arguments, SLJIT has only SLJIT_CALL0, 371CALL1, CALL2, CALL3, which means you can call a function with 3 arguments in max(that 372disappoint me, no chance to call fwrite in SLJIT), the arguments for the callee function 373are passed from SLJIT_R0, R1 and R2. Keep in mind to maintain those 'temp registers'.<br> 374<br> 375Assume that we have an external function:<br> 376<ul> 377 sljit_sw print_num(sljit_sw a); 378</ul> 379 380JIT code to call print_num(S1): 381 382<div style='font-family:Courier New;font-size:11px'> 383<ul> 384 /* R0 = S1; */<br> 385 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_S1, 0);<br> 386 /* print_num(R0) */<br> 387 sljit_emit_ijump(C, SLJIT_CALL1, SLJIT_IMM, SLJIT_FUNC_OFFSET(print_num));<br> 388</ul> 389</div> 390<br> 391This code call a imm-data(address of print_num), which is linked properly when the 392program loaded. There no problem in 1-time compile and execute, but when you planning 393to save to file and load/execute next time, that address may not correct as you expect, 394in some platform that support PIC, the address of print_num may relocate to another 395address in run-time. Check this out: 396<a href="http://en.wikipedia.org/wiki/Position-independent_code">PIC</a><br> 397<br> 398 399<h2>Structure access</h2> 400 401SLJIT use SLJIT_MEM1 to implement [Reg + offset] memory access.<br> 402<div style='font-family:Courier New;font-size:11px'> 403<ul> 404struct point_st {<br> 405 <ul> 406 sljit_sw x;<br> 407 int y;<br> 408 short z;<br> 409 char d;<br> 410 char e;<br> 411 </ul> 412};<br> 413<br> 414sljit_emit_op1(C, SLJIT_MOV_SI, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_S0),<br> 415<ul> 416SLJIT_OFFSETOF(struct point_st, y));<br> 417</ul> 418</ul> 419</div> 420 421In this case, SLJIT_S0 is the address of the point_st structure, offset of member 'y' 422is determined in compile time, the important MOV operation always comes with a 423'signed/size' postfix, like this one _SI means 'signed 32bits integer', the postfix 424list:<br> 425<ul> 426 <b>UB</b> = unsigned byte (8 bit)<br> 427 <b>SB</b> = signed byte (8 bit)<br> 428 <b>UH</b> = unsigned half (16 bit)<br> 429 <b>SH</b> = signed half (16 bit)<br> 430 <b>UI</b> = unsigned int (32 bit)<br> 431 <b>SI</b> = signed int (32 bit)<br> 432 <b>P</b> = pointer (sljit_p) size<br> 433</ul> 434 435<h2>Array accessing</h2> 436 437SLJIT use SLJIT_MEM2 to access arrays, like this:<br> 438 439<div style='font-family:Courier New;font-size:11px'> 440<ul> 441sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM2(SLJIT_S0, SLJIT_S2),<br> 442<ul> 443SLJIT_WORD_SHIFT); 444</ul> 445</ul> 446</div> 447 448This statement generates a code like this:<br> 449<ul> 450WORD S0[];<br> 451R0 = S0[S2]<br> 452</ul> 453<br> 454The array S0 is declared to be WORD, which will be sizeof(sljit_sw) in length. 455Sljit use a 'shift' for length representation: (0 for single byte, 1 for 2 456bytes, 2 for 4 bytes, 3 for 8bytes)<br> 457<br> 458The file array_access.c demonstrate a array-print example, should be easy 459to understand.<br> 460 461<h2>Local variables</h2> 462 463SLJIT provide SLJIT_MEM1(SLJIT_SP) to access the reserved space in 464sljit_emit_enter's last parameter.<br> 465In this example we have to pass the address to print_arr, local variable 466is the only choice.<br> 467 468<div style='font-family:Courier New;font-size:11px'> 469<ul> 470 /* reserved space in stack for sljit_sw arr[3] */<br> 471 sljit_emit_enter(C, 0, 3, 2, 3, 0, 0, 3 * sizeof(sljit_sw));<br> 472 /* opt arg R S FR FS local_size */<br> 473 <br> 474 /* arr[0] = S0, SLJIT_SP is the init address of local var */<br> 475 sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 0, SLJIT_S0, 0);<br> 476 /* arr[1] = S1 */<br> 477 sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 1 * sizeof(sljit_sw), SLJIT_S1, 0);<br> 478 /* arr[2] = S2 */<br> 479 sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 2 * sizeof(sljit_sw), SLJIT_S2, 0);<br> 480 <br> 481 /* R0 = arr; in fact SLJIT_SP is the address of arr, but can't do so in SLJIT */<br> 482 sljit_get_local_base(C, SLJIT_R0, 0, 0); /* get the address of local variables */<br> 483 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, 3); /* R1 = 3; */<br> 484 sljit_emit_ijump(C, SLJIT_CALL2, SLJIT_IMM, SLJIT_FUNC_OFFSET(print_arr));<br> 485 sljit_emit_return(C, SLJIT_MOV, SLJIT_R0, 0);<br> 486</ul> 487</div> 488<br> 489SLJIT_SP can only be used in SLJIT_MEM1(SLJIT_SP). In this case, SP is the 490address of 'arr', but we cannot assign it to Reg using SLJIT_MOV opr, 491instead, we use sljit_get_local_base, which load the address and offset of 492local variable to the target.<br> 493 494<h2>Brainfuck compiler</h2> 495 496Ok, the basic usage of SLJIT ends here, with more detail, I suggest reading 497sljitLir.h directly, having fun hacking the wonder of SLJIT!<br> 498<br> 499The brainfuck machine introduction can be found here: 500<a href="http://en.wikipedia.org/wiki/Brainfuck">Brainfuck</a><br> 501<br> 502 503<h2>Extra</h2> 504 5051. Dump_code function<br> 506SLJIT didn't provide disassemble functional, this is a simple function to do this(X86 only)<br> 507<br> 508 509<div style='font-family:Courier New;font-size:11px'> 510<ul> 511static void dump_code(void *code, sljit_uw len)<br> 512{<br> 513<ul> 514 FILE *fp = fopen("/tmp/slj_dump", "wb");<br> 515 if (!fp)<br> 516 <ul> 517 return;<br> 518 </ul> 519 fwrite(code, len, 1, fp);<br> 520 fclose(fp);<br> 521</ul> 522#if defined(SLJIT_CONFIG_X86_64)<br> 523<ul> 524 system("objdump -b binary -m l1om -D /tmp/slj_dump");<br> 525</ul> 526#elif defined(SLJIT_CONFIG_X86_32)<br> 527<ul> 528 system("objdump -b binary -m i386 -D /tmp/slj_dump");<br> 529</ul> 530#endif<br> 531} 532</ul> 533</div> 534 535The branch example disassembling:<br> 536 <br> 5370000000000000000 <.data>:<br> 538<ul> 539<table> 540<tr><td>0:</td><td>53</td><td>push %rbx</td></tr> 541<tr><td>1:</td><td>41 57</td><td>push %r15</td></tr> 542<tr><td>3:</td><td>41 56</td><td>push %r14</td></tr> 543<tr><td>5:</td><td>48 8b df</td><td>mov %rdi,%rbx</td></tr> 544<tr><td>8:</td><td>4c 8b fe</td><td>mov %rsi,%r15</td></tr> 545<tr><td>b:</td><td>4c 8b f2</td><td>mov %rdx,%r14</td></tr> 546<tr><td>e:</td><td>48 83 ec 10</td><td>sub $0x10,%rsp</td></tr> 547<tr><td>12:</td><td>48 89 d8</td><td>mov %rbx,%rax</td></tr> 548<tr><td>15:</td><td>48 83 e0 01</td><td>and $0x1,%rax</td></tr> 549<tr><td>19:</td><td>48 83 f8 00</td><td>cmp $0x0,%rax</td></tr> 550<tr><td>1d:</td><td>74 05</td><td>je 0x24</td></tr> 551<tr><td>1f:</td><td>4c 89 f8</td><td>mov %r15,%rax</td></tr> 552<tr><td>22:</td><td>eb 03</td><td>jmp 0x27</td></tr> 553<tr><td>24:</td><td>4c 89 f0</td><td>mov %r14,%rax</td></tr> 554<tr><td>27:</td><td>48 83 c4 10</td><td>add $0x10,%rsp</td></tr> 555<tr><td>2b:</td><td>41 5e</td><td>pop %r14</td></tr> 556<tr><td>2d:</td><td>41 5f</td><td>pop %r15</td></tr> 557<tr><td>2f:</td><td>5b</td><td>pop %rbx</td></tr> 558<tr><td>30:</td><td>c3</td><td>retq</td></tr> 559</table> 560</ul> 561<br> 562with GCC -O2<br> 5630000000000000000 <func>:<br> 564<ul> 565<table> 566<tr><td>0:</td><td>48 89 d0</td><td>mov %rdx,%rax</td></tr> 567<tr><td>3:</td><td>83 e7 01</td><td>and $0x1,%edi</td></tr> 568<tr><td>6:</td><td>48 0f 45 c6</td><td>cmovne %rsi,%rax</td></tr> 569<tr><td>a:</td><td>c3</td><td>retq</td></tr> 570</table> 571</ul> 572<br> 573Err... Ok, the optimization here may be weak, or, optimization there is crazy... :-)<br> 574 575<table width="100%" cellspacing=0 cellpadding=0> 576<tr><td align=right>By wenxichang#163.com, 2015.5.10</td></tr></table> 577 578</td><td width=20 class="main"></td></tr> 579<tr height=20><td width=20 class="main"></td><td width=720 class="main"></td><td width=20 class="main"></td></tr> 580</table> 581</center> 582 583</body> 584</html> 585