#
376bcb54 |
| 30-Jul-2022 |
riastradh <riastradh@NetBSD.org> |
x86: Eliminate mfence hotpatch for membar_sync.
The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper than MFENCE anyway. Let's save some space and maintenance and rip out the hotpatch
x86: Eliminate mfence hotpatch for membar_sync.
The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper than MFENCE anyway. Let's save some space and maintenance and rip out the hotpatching for it.
show more ...
|
#
e0c914a7 |
| 09-Apr-2022 |
riastradh <riastradh@NetBSD.org> |
x86: Every load is a load-acquire, so membar_consumer is a noop.
lfence is only needed for MD logic, such as operations on I/O memory rather than normal cacheable memory, or special instructions lik
x86: Every load is a load-acquire, so membar_consumer is a noop.
lfence is only needed for MD logic, such as operations on I/O memory rather than normal cacheable memory, or special instructions like RDTSC -- never for MI synchronization between threads/CPUs. No need for hot-patching to do lfence here.
(The x86_lfence function might reasonably be patched on i386 to do lfence for MD logic, but it isn't now and this doesn't change that.)
show more ...
|
#
63587e37 |
| 17-Apr-2021 |
rillig <rillig@NetBSD.org> |
sys/arch/amd64: remove trailing whitespace
|
#
95a0a188 |
| 19-Jul-2020 |
maxv <maxv@NetBSD.org> |
Revert most of ad's movs/stos change. Instead do a lot simpler: declare svs_quad_copy() used by SVS only, with no need for instrumentation, because SVS is disabled when sanitizers are on.
|
#
c1ac8a41 |
| 21-Jun-2020 |
bouyer <bouyer@NetBSD.org> |
Fix comment
|
#
248fe10b |
| 01-Jun-2020 |
ad <ad@NetBSD.org> |
Reported-by: syzbot+6dd5a230d19f0cbc7814@syzkaller.appspotmail.com
Instrument STOS/MOVS for KMSAN to unbreak it.
|
#
129e4c2b |
| 26-Apr-2020 |
maxv <maxv@NetBSD.org> |
Use the hotpatch framework for LFENCE/MFENCE.
|
#
c24c993f |
| 25-Apr-2020 |
bouyer <bouyer@NetBSD.org> |
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM guests in GENERIC. Xen support can be disabled at runtime with boot -c disable hypervisor
|
#
e7e5aa03 |
| 17-Nov-2019 |
maxv <maxv@NetBSD.org> |
Disable KCOV - by raising the interrupt level - in the TLB IPI handler, because this is only noise.
|
#
10c5b023 |
| 14-Nov-2019 |
maxv <maxv@NetBSD.org> |
Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38
Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed.
We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory.
The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan.
The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable).
We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK.
Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment.
Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level.
The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset.
This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable.
kMSan is available with LLVM, but not with GCC.
The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported.
show more ...
|
#
560337f7 |
| 12-Oct-2019 |
maxv <maxv@NetBSD.org> |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.1
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
show more ...
|
#
125e142f |
| 18-May-2019 |
maxv <maxv@NetBSD.org> |
Two changes in the CPU mitigations:
* Micro-optimize: put every mitigation in the same branch. This removes two branches in each exc/int return path, and removes all branches in the syscall r
Two changes in the CPU mitigations:
* Micro-optimize: put every mitigation in the same branch. This removes two branches in each exc/int return path, and removes all branches in the syscall return path.
* Modify the SpectreV2 mitigation to be compatible with SpectreV4. I recently realized that both couldn't be enabled at the same time on Intel. This is because initially, when there was just SpectreV2, we could reset the whole IA32_SPEC_CTRL MSR. But then Intel added another bit in it for SpectreV4, so it isn't right to reset it entirely anymore. SSBD needs to stay.
show more ...
|
#
74b8eea5 |
| 14-May-2019 |
maxv <maxv@NetBSD.org> |
Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS).
It requires a microcode update, now available on the Intel website. The microcode modifies the behavior of the VERW instruction
Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS).
It requires a microcode update, now available on the Intel website. The microcode modifies the behavior of the VERW instruction, and makes it flush internal CPU buffers. We hotpatch the return-to-userland path to add VERW.
Two sysctls are added:
machdep.mds.mitigated = {0/1} user-settable machdep.mds.method = {string} constructed by the kernel
The kernel will automatically enable the mitigation if the updated microcode is present. If the new microcode is not present, the user can load it via cpuctl, and set machdep.mds.mitigated=1.
show more ...
|
#
427af037 |
| 11-Feb-2019 |
cherry <cherry@NetBSD.org> |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources requir
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
show more ...
|
#
afb643e1 |
| 12-Aug-2018 |
maxv <maxv@NetBSD.org> |
Move the PCPU area from slot 384 to slot 510, to avoid creating too much fragmentation in the slot space (384 is in the middle of the kernel half of the VA).
|
#
2681a041 |
| 13-Jul-2018 |
martin <martin@NetBSD.org> |
Provide empty SVS_ENTER_NMI/SVS_LEAVE_NMI for kernels w/o options SVS
|
#
8b7b3795 |
| 12-Jul-2018 |
maxv <maxv@NetBSD.org> |
Handle NMIs correctly when SVS is enabled. We store the kernel's CR3 at the top of the NMI stack, and we unconditionally switch to it, because we don't know with which page tables we received the NMI
Handle NMIs correctly when SVS is enabled. We store the kernel's CR3 at the top of the NMI stack, and we unconditionally switch to it, because we don't know with which page tables we received the NMI. Hotpatch the whole thing as usual.
This restores the ability to use PMCs on Intel CPUs.
show more ...
|
#
0223f0c8 |
| 28-Mar-2018 |
maxv <maxv@NetBSD.org> |
Add the IBRS mitigation for SpectreV2 on amd64.
Different operations are performed during context transitions:
user->kernel: IBRS <- 1 kernel->user: IBRS <- 0
And during context switches:
user
Add the IBRS mitigation for SpectreV2 on amd64.
Different operations are performed during context transitions:
user->kernel: IBRS <- 1 kernel->user: IBRS <- 0
And during context switches:
user->user: IBPB <- 0 kernel->user: IBPB <- 0 [user->kernel:IBPB <- 0 this one may not be needed]
We use two macros, IBRS_ENTER and IBRS_LEAVE, to set the IBRS bit. The thing is hotpatched for better performance, like SVS.
The idea is that IBRS is a "privileged" bit, which is set to 1 in kernel mode and 0 in user mode. To protect the branch predictor between user processes (which are of the same privilege), we use the IBPB barrier.
The Intel manual also talks about (MWAIT/HLT)+HyperThreading, and says that when using either of the two instructions IBRS must be disabled for better performance on the core. I'm not totally sure about this part, so I'm not adding it now.
IBRS is available only when the Intel microcode update is applied. The mitigation must be enabled manually with machdep.spectreV2.mitigated.
Tested by msaitoh a week ago (but I adapted a few things since). Probably more changes to come.
show more ...
|
#
ccc038a8 |
| 25-Feb-2018 |
maxv <maxv@NetBSD.org> |
Remove INTRENTRY_L, it's not used anymore.
|
#
5ab8a4e7 |
| 22-Feb-2018 |
maxv <maxv@NetBSD.org> |
Make the machdep.svs_enabled sysctl writable, and add the kernel code needed to disable SVS at runtime.
We set 'svs_enabled' to false, and hotpatch the kernel entry/exit points to eliminate the cont
Make the machdep.svs_enabled sysctl writable, and add the kernel code needed to disable SVS at runtime.
We set 'svs_enabled' to false, and hotpatch the kernel entry/exit points to eliminate the context switch code.
We need to make sure there is no remote CPU that is executing the code we are hotpatching. So we use two barriers:
* After the first one each CPU is guaranteed to be executing in svs_disable_cpu with interrupts disabled (this way it can't leave this place).
* After the second one it is guaranteed that SVS is disabled, so we flush the cache, enable interrupts and continue execution normally.
Between the two barriers, cpu0 will disable SVS (svs_enabled=false and hotpatch), and each CPU will restore the generic syscall entry point.
Three notes:
* We should call svs_pgg_update(true) afterwards, to put back PG_G on the kernel pages (for better performance). This will be done in another commit.
* The fact that we disable interrupts does not prevent us from receiving an NMI, and it would be problematic. So we need to add some code to verify that PMCs are disabled before hotpatching. This will be done in another commit.
* In svs_disable() we expect each CPU to be online. We need to add a check to make sure they indeed are.
The sysctl allows only a 1->0 transition. There is no point in doing 0->1 transitions anyway, and it would be complicated to implement because we need to re-synchronize the CPU user page tables with the current ones (we lost track of them in the last 1->0 transition).
show more ...
|
#
ebc1f703 |
| 22-Feb-2018 |
maxv <maxv@NetBSD.org> |
Add a dynamic detection for SVS.
The SVS_* macros are now compiled as skip-noopt. When the system boots, if the cpu is from Intel, they are hotpatched to their real content. Typically:
jmp 1f i
Add a dynamic detection for SVS.
The SVS_* macros are now compiled as skip-noopt. When the system boots, if the cpu is from Intel, they are hotpatched to their real content. Typically:
jmp 1f int3 int3 int3 ... int3 ... 1:
gets hotpatched to:
movq SVS_UTLS+UTLS_KPDIRPA,%rax movq %rax,%cr3 movq CPUVAR(KRSP0),%rsp
These two chunks of code being of the exact same size. We put int3 (0xCC) to make sure we never execute there.
In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that the SVS_* macros are small, this jump will likely leave us in the same icache line, so it's pretty fast.
The syscall entry point is special, because there we use a scratch uint64_t not in curcpu but in the UTLS page, and it's difficult to hotpatch this properly. So instead of hotpatching we declare the entry point as an ASM macro, and define two functions: syscall and syscall_svs, the latter being the one used in the SVS case.
While here 'syscall' is optimized not to contain an SVS_ENTER - this way we don't even need to do a jump on the non-SVS case.
When adding pages in the user page tables, make sure we don't have PG_G, now that it's dynamic.
A read-only sysctl is added, machdep.svs_enabled, that tells whether the kernel uses SVS or not.
More changes to come, svs_init() is not very clean.
show more ...
|
#
2e43b1e6 |
| 27-Jan-2018 |
maxv <maxv@NetBSD.org> |
Put the default %cs value in INTR_RECURSE_HWFRAME. Pushing an immediate costs less than reading the %cs register and pushing its value. This value is not allowed to be != GSEL(GCODE_SEL,SEL_KPL) in a
Put the default %cs value in INTR_RECURSE_HWFRAME. Pushing an immediate costs less than reading the %cs register and pushing its value. This value is not allowed to be != GSEL(GCODE_SEL,SEL_KPL) in all cases.
show more ...
|
#
7318ea98 |
| 27-Jan-2018 |
maxv <maxv@NetBSD.org> |
Declare and use INTR_RECURSE_ENTRY, an optimized version of INTRENTRY. When processing deferred interrupts, we are always entering the new handler in kernel mode, so there is no point performing the
Declare and use INTR_RECURSE_ENTRY, an optimized version of INTRENTRY. When processing deferred interrupts, we are always entering the new handler in kernel mode, so there is no point performing the userland checks.
Saves several instructions.
show more ...
|
#
21a8fbaf |
| 27-Jan-2018 |
maxv <maxv@NetBSD.org> |
Remove DO_DEFERRED_SWITCH and DO_DEFERRED_SWITCH_RETRY, unused.
|
#
cea874c7 |
| 21-Jan-2018 |
maxv <maxv@NetBSD.org> |
Unmap the kernel from userland in SVS, and leave only the needed trampolines. As explained below, SVS should now completely mitigate Meltdown on GENERIC kernels, even though it needs some more tweaki
Unmap the kernel from userland in SVS, and leave only the needed trampolines. As explained below, SVS should now completely mitigate Meltdown on GENERIC kernels, even though it needs some more tweaking for GENERIC_KASLR.
Until now the kernel entry points looked like:
FUNC(intr) pushq $ERR pushq $TRAPNO INTRENTRY ... handle interrupt ... INTRFASTEXIT END(intr)
With this change they are split and become:
FUNC(handle) ... handle interrupt ... INTRFASTEXIT END(handle)
TEXT_USER_BEGIN FUNC(intr) pushq $ERR pushq $TRAPNO INTRENTRY jmp handle END(intr) TEXT_USER_END
A new section is introduced, .text.user, that contains minimal kernel entry/exit points. In order to choose what to put in this section, two macros are introduced, TEXT_USER_BEGIN and TEXT_USER_END.
The section is mapped in userland with normal 4K pages.
In GENERIC, the section is 4K-page-aligned and embedded in .text, which is mapped with large pages. That is to say, when an interrupt comes in, the CPU has the user page tables loaded and executes the 'intr' functions on 4K pages; after calling SVS_ENTER (in INTRENTRY) these 4K pages become 2MB large pages, and remain so when executing in kernel mode.
In GENERIC_KASLR, the section is 4K-page-aligned and independent from the other kernel texts. The prekern just picks it up and maps it at a random address.
In GENERIC, SVS should now completely mitigate Meltdown: what we put in .text.user is not secret.
In GENERIC_KASLR, SVS would have to be improved a bit more: the 'jmp handle' instruction is actually secret, since it leaks the address of the section we are jumping into. By exploiting Meltdown on Intel, this theoretically allows a local user to reconstruct the address of the first text section. But given that our KASLR produces several texts, and that each section is not correlated with the others, the level of protection KASLR provides is still good.
show more ...
|