1dnl x86 fat binary entrypoints. 2 3dnl Contributed to the GNU project by Kevin Ryde (original x86_32 code) and 4dnl Torbjorn Granlund (port to x86_64) 5 6dnl Copyright 2003, 2009, 2011, 2012 Free Software Foundation, Inc. 7 8dnl This file is part of the GNU MP Library. 9 10dnl The GNU MP Library is free software; you can redistribute it and/or 11dnl modify it under the terms of the GNU Lesser General Public License as 12dnl published by the Free Software Foundation; either version 3 of the 13dnl License, or (at your option) any later version. 14 15dnl The GNU MP Library is distributed in the hope that it will be useful, 16dnl but WITHOUT ANY WARRANTY; without even the implied warranty of 17dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 18dnl Lesser General Public License for more details. 19 20dnl You should have received a copy of the GNU Lesser General Public License 21dnl along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. 22 23include(`../config.m4') 24 25 26dnl Forcibly disable profiling. 27dnl 28dnl The entrypoints and inits are small enough not to worry about, the real 29dnl routines arrived at will have any profiling. Also, the way the code 30dnl here ends with a jump means we won't work properly with the 31dnl "instrument" profiling scheme anyway. 32 33define(`WANT_PROFILING',no) 34 35 36dnl We define PIC_OR_DARWIN as a helper symbol, the use it for suppressing 37dnl normal, fast call code, since that triggers problems on darwin. 38dnl 39dnl FIXME: There might be a more elegant solution, adding less overhead. 40 41ifdef(`DARWIN', 42`define(`PIC_OR_DARWIN')') 43ifdef(`PIC', 44`define(`PIC_OR_DARWIN')') 45 46ABI_SUPPORT(DOS64) 47ABI_SUPPORT(STD64) 48 49 TEXT 50 51dnl Usage: FAT_ENTRY(name, offset) 52dnl 53dnl Emit a fat binary entrypoint function of the given name. This is the 54dnl normal entry for applications, eg. __gmpn_add_n. 55dnl 56dnl The code simply jumps through the function pointer in __gmpn_cpuvec at 57dnl the given "offset" (in bytes). 58dnl 59dnl For non-PIC, the jumps are 5 bytes each, aligning them to 8 should be 60dnl fine for all x86s. 61dnl 62dnl For ELF/DARWIN PIC, the jumps are 20 bytes each, and are best aligned to 63dnl 16 to ensure at least the first two instructions don't cross a cache line 64dnl boundary. 65dnl 66dnl For DOS64, the jumps are 6 bytes. The same form works also for GNU/Linux 67dnl (at least with certain assembler/linkers) but FreeBSD 8.2 crashes. Not 68dnl tested on Darwin, Slowaris, NetBSD, etc. 69dnl 70dnl Note the extra `' ahead of PROLOGUE obscures it from the HAVE_NATIVE 71dnl grepping in configure, stopping that code trying to eval something with 72dnl $1 in it. 73 74define(FAT_ENTRY, 75m4_assert_numargs(2) 76`ifdef(`HOST_DOS64', 77` ALIGN(8) 78`'PROLOGUE($1) 79 jmp *$2+GSYM_PREFIX`'__gmpn_cpuvec(%rip) 80EPILOGUE() 81', 82` ALIGN(ifdef(`PIC',16,8)) 83`'PROLOGUE($1) 84ifdef(`PIC_OR_DARWIN', 85` LEA( GSYM_PREFIX`'__gmpn_cpuvec, %rax) 86 jmp *$2(%rax) 87',`dnl non-PIC 88 jmp *GSYM_PREFIX`'__gmpn_cpuvec+$2 89') 90EPILOGUE() 91')') 92 93 94dnl FAT_ENTRY for each CPUVEC_FUNCS_LIST 95dnl 96 97define(`CPUVEC_offset',0) 98foreach(i, 99`FAT_ENTRY(MPN(i),CPUVEC_offset) 100define(`CPUVEC_offset',eval(CPUVEC_offset + 8))', 101CPUVEC_FUNCS_LIST) 102 103 104dnl Usage: FAT_INIT(name, offset) 105dnl 106dnl Emit a fat binary initializer function of the given name. These 107dnl functions are the initial values for the pointers in __gmpn_cpuvec. 108dnl 109dnl The code simply calls __gmpn_cpuvec_init, and then jumps back through 110dnl the __gmpn_cpuvec pointer, at the given "offset" (in bytes). 111dnl __gmpn_cpuvec_init will have stored the address of the selected 112dnl implementation there. 113dnl 114dnl Only one of these routines will be executed, and only once, since after 115dnl that all the __gmpn_cpuvec pointers go to real routines. So there's no 116dnl need for anything special here, just something small and simple. To 117dnl keep code size down, "fat_init" is a shared bit of code, arrived at 118dnl with the offset in %al. %al is used since the movb instruction is 2 119dnl bytes where %eax would be 4. 120dnl 121dnl Note having `PROLOGUE in FAT_INIT obscures that PROLOGUE from the 122dnl HAVE_NATIVE grepping in configure, preventing that code trying to eval 123dnl something with $1 in it. 124dnl 125dnl We need to preserve parameter registers over the __gmpn_cpuvec_init call 126 127define(FAT_INIT, 128m4_assert_numargs(2) 129`PROLOGUE($1) 130 mov $`'$2, %al 131 jmp L(fat_init) 132EPILOGUE() 133') 134 135dnl FAT_INIT for each CPUVEC_FUNCS_LIST 136dnl 137 138define(`CPUVEC_offset',0) 139foreach(i, 140`FAT_INIT(MPN(i`'_init),CPUVEC_offset) 141define(`CPUVEC_offset',eval(CPUVEC_offset + 1))', 142CPUVEC_FUNCS_LIST) 143 144L(fat_init): 145 C al __gmpn_cpuvec byte offset 146 147 movzbl %al, %eax 148IFSTD(` push %rdi ') 149IFSTD(` push %rsi ') 150 push %rdx 151 push %rcx 152 push %r8 153 push %r9 154 push %rax 155 CALL( __gmpn_cpuvec_init) 156 pop %rax 157 pop %r9 158 pop %r8 159 pop %rcx 160 pop %rdx 161IFSTD(` pop %rsi ') 162IFSTD(` pop %rdi ') 163ifdef(`PIC_OR_DARWIN',` 164 LEA( GSYM_PREFIX`'__gmpn_cpuvec, %r10) 165 jmp *(%r10,%rax,8) 166',`dnl non-PIC 167 jmp *GSYM_PREFIX`'__gmpn_cpuvec(,%rax,8) 168') 169 170 171C long __gmpn_cpuid (char dst[12], int id); 172C 173C This is called only 3 times, so just something simple and compact is fine. 174 175 176define(`rp', `%rdi') 177define(`idx', `%rsi') 178 179PROLOGUE(__gmpn_cpuid) 180 FUNC_ENTRY(2) 181 mov %rbx, %r8 182 mov R32(idx), R32(%rax) 183 cpuid 184 mov %ebx, (rp) 185 mov %edx, 4(rp) 186 mov %ecx, 8(rp) 187 mov %r8, %rbx 188 FUNC_EXIT() 189 ret 190EPILOGUE() 191