kern_sched.c - OpenGrok history log for /openbsd-src/sys/kern/kern

Revision	Date	Author	Comments
# 7befbc1b	05-Oct-2018	cheloha <cheloha@openbsd.org>	Revert KERN_CPTIME2 ENODEV changes in kernel and userspace. ok kettenis deraadt
# 3634178a	26-Sep-2018	cheloha <cheloha@openbsd.org>	KERN_CPTIME2: set ENODEV if the CPU is offline. This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add docu KERN_CPTIME2: set ENODEV if the CPU is offline. This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add documentation for this to sysctl.2 (and perhaps elsewhere) after the dust settles. Also included here are changes to systat(1) and top(1) that account for the ENODEV case and adjust behavior accordingly: - systat(1)'s cpu view prints placeholder marks ('-') instead of percentages for each state if the given CPU is offline. - systat(1)'s vmstat view checks for offline CPUs when computing the machine state total and excludes them, so the CPU usage graph only represents the states for online CPUs. - top(1) does not draw CPU rows for offline CPUs when the view is redrawn. If CPUs "go offline", percentages for each state are replaced by placeholder marks ('-'); the view will need to be redrawn to remove these rows. If CPUs "go online" the view will need to be redrawn to show these new CPUs. In "combined CPU" mode, the count and the state totals only represent online CPUs. Ports using KERN_CPTIME2 will need to be updated. The changes described above to make systat(1) and top(1) aware of the ENODEV case and gracefully handle a changing HW_NCPUONLINE while the application is running are not necessarily appropriate for each and every port. The changes described above are so extensive in part to demonstrate one way a program might be made robust to changing CPU availability. In particular, changing hw.smt after boot is an extremely rare event, and this needs to be weighed when updating ports. The logic needed to account for the KERN_CPTIME2 ENODEV case is very roughly: if (sysctl(...) == -1) { if (errno != ENODEV) { /* Actual error occurred. / } else { / CPU is offline. / } } else { / CPU is online and CPU states were set by sysctl(2). */ } Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at length with kettenis@. Additional testing by tb@. No complaints from hackers@ after a week. ok kettenis@, "I think you should commit [now]" deraadt@ show more ...
# c71ddef4	12-Jul-2018	cheloha <cheloha@openbsd.org>	Add hw.ncpuonline to count the number of online CPUs. The introduction of hw.smt means that logical CPUs can be disabled after boot and prior to suspend/resume. If hw.smt=0 (the default), there nee Add hw.ncpuonline to count the number of online CPUs. The introduction of hw.smt means that logical CPUs can be disabled after boot and prior to suspend/resume. If hw.smt=0 (the default), there needs to be a way to count the number of hardware threads available on the system at any given time. So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it. hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu() during boot, while hw.ncpuonline is equal to the number of CPUs available to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu. This is preferable to adding a new sysctl to count the number of configured CPUs and keeping hw.ncpu equal to the number of online CPUs because such a change would break software in the ecosystem that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like. Such software in base includes top(1), systat(1), and snmpd(8), and perhaps others. We don't need additional locking to count the cardinality of a cpuset in this case because the only interfaces that can modify said cardinality are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK. Software using HW_NCPU/hw.ncpu to determine optimal parallism will need to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software may perform suboptimally. However, most changes will be similar to the change included here for libcxx's std::thread:hardware_concurrency(): using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN is insufficient. Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen. Lots of patch tweaks from kettenis. ok kettenis, "proceed" deraadt show more ...
# 3183901a	07-Jul-2018	visa <visa@openbsd.org>	Release the kernel lock fully on thread exit. This prevents a locking error that would happen otherwise when a traced and stopped multithreaded process is forced to exit. The error shows up as a kern Release the kernel lock fully on thread exit. This prevents a locking error that would happen otherwise when a traced and stopped multithreaded process is forced to exit. The error shows up as a kernel panic when WITNESS is enabled. Without WITNESS, the error causes a system hang. sched_exit() has expected that a single KERNEL_UNLOCK() would release the lock completely. That assumption is wrong when an exit happens through the signal tracing logic: sched_exit exit1 single_thread_check single_thread_set issignal <-- KERNEL_LOCK() userret <-- KERNEL_LOCK() syscall The error is a regression of r1.216 of kern_sig.c. Panic reported and fix tested by Laurence Tratt OK mpi@ show more ...
# 73dc9ed4	30-Jun-2018	kettenis <kettenis@openbsd.org>	Don't steal processes from other CPUs if we're not scheduling processes on a CPU. ok deraadt@
# 96c11352	19-Jun-2018	kettenis <kettenis@openbsd.org>	SMT (Simultanious Multi Threading) implementations typically share TLBs and L1 caches between threads. This can make cache timing attacks a lot easier and we strongly suspect that this will make sev SMT (Simultanious Multi Threading) implementations typically share TLBs and L1 caches between threads. This can make cache timing attacks a lot easier and we strongly suspect that this will make several spectre-class bugs exploitable. Especially on Intel's SMT implementation which is better known as Hypter-threading. We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial. Since many modern machines no longer provide the ability to disable Hyper-threading in the BIOS setup, provide a way to disable the use of additional processor threads in our scheduler. And since we suspect there are serious risks, we disable them by default. This can be controlled through a new hw.smt sysctl. For now this only works on Intel CPUs when running OpenBSD/amd64. But we're planning to extend this feature to CPUs from other vendors and other hardware architectures. Note that SMT doesn't necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores. ok deraadt@ show more ...
# 0e4e5752	14-Dec-2017	dlg <dlg@openbsd.org>	make sched_barrier use cond_wait/cond_signal. previously the code was using a percpu flag to manage the sleeps/wakeups, which means multiple threads waiting for a barrier on a cpu could race. moving make sched_barrier use cond_wait/cond_signal. previously the code was using a percpu flag to manage the sleeps/wakeups, which means multiple threads waiting for a barrier on a cpu could race. moving to a cond struct on the stack fixes this. while here, get rid of the sbar taskq and just use systqmp instead. the barrier tasks are short, so there's no real downside. ok mpi@ show more ...
# 64a91b3e	28-Nov-2017	visa <visa@openbsd.org>	Raise the IPL of the sbar taskq to avoid lock order issues with the kernel lock. Fixes a deadlock seen by Hrvoje Popovski and dhill@. OK mpi@, dhill@
# 79a514fc	12-Feb-2017	guenther <guenther@openbsd.org>	Split up fork1(): - FORK_THREAD handling is a totally separate function, thread_fork(), that is only used by sys___tfork() and which loses the flags, func, arg, and newprocp parameters and gai Split up fork1(): - FORK_THREAD handling is a totally separate function, thread_fork(), that is only used by sys___tfork() and which loses the flags, func, arg, and newprocp parameters and gains tcb parameter to guarantee the new thread's TCB is set before the creating thread returns - fork1() loses its stack and tidptr parameters Common bits factor out: - struct proc allocation and initialization moves to thread_new() - maxthread handling moves to fork_check_maxthread() - setting the new thread running moves to fork_thread_start() The MD cpu_fork() function swaps its unused stacksize parameter for a tcb parameter. luna88k testing by aoyama@, alpha testing by dlg@ ok mpi@ show more ...
# 8fda72b7	21-Jan-2017	guenther <guenther@openbsd.org>	p_comm is the process's command and isn't per thread, so move it from struct proc to struct process. ok deraadt@ kettenis@
# 336ad790	03-Jun-2016	kettenis <kettenis@openbsd.org>	Allow pegged process on secondary CPUs to continue to be scheduled when halting a CPU. Necessary for intr_barrier(9) to work when interrupts are targeted at secondary CPUs. ok mpi@, mikeb@ (a while Allow pegged process on secondary CPUs to continue to be scheduled when halting a CPU. Necessary for intr_barrier(9) to work when interrupts are targeted at secondary CPUs. ok mpi@, mikeb@ (a while back) show more ...
# b2602131	17-Mar-2016	mpi <mpi@openbsd.org>	Replace curcpu_is_idle() by cpu_is_idle() and use it instead of rolling our own. From Michal Mazurek, ok mmcc@
# 6dc79ad1	23-Dec-2015	kettenis <kettenis@openbsd.org>	One "sbar" taskq is enough. ok visa@
# 581334fc	17-Dec-2015	kettenis <kettenis@openbsd.org>	Make the cost of moving a process to the primary cpu a bit higher. This is the CPU that handles most hardware interrupts but we don't account for that in any way in the scheduler. So processes (and Make the cost of moving a process to the primary cpu a bit higher. This is the CPU that handles most hardware interrupts but we don't account for that in any way in the scheduler. So processes (and kernel threads) that are unlucky enough to end up on this CPU will get less CPU cycles than those running on other CPUs. This is especially true for the softnet taskq. There network interrupts will prevent the softnet taskq from running. This means that the more packets we receive, the less packets we can actually process and/or forward. This is why "unlocking" network drivers actually decreases the forwarding performance. This diff restores most of the lost performance by making it less likely that the softnet taskq ends up on the same CPU that handles network interrupts. Tested by Hrvoje Popovski ok mpi@, deraadt@ show more ...
# fc08c356	16-Oct-2015	mpi <mpi@openbsd.org>	Make sched_barrier() use its own task queue to avoid deadlocks. Prevent a deadlock from occuring when intr_barrier() is called from a non-primary CPU in the watchdog task, also enqueued on ``systq'' Make sched_barrier() use its own task queue to avoid deadlocks. Prevent a deadlock from occuring when intr_barrier() is called from a non-primary CPU in the watchdog task, also enqueued on ``systq''. ok kettenis@ show more ...
# 077ff916	20-Sep-2015	kettenis <kettenis@openbsd.org>	Short circuit if we're running on the CPU that we want to sync with. Fixes suspend on machines with em(4) now that it uses intr_barrier(9). ok krw@
# f30f8d91	13-Sep-2015	kettenis <kettenis@openbsd.org>	Introduce sched_barrier(9), an interface that acts as a scheduler barrier in the sense that it guarantees that the specified CPU went through the scheduler. This also guarantees that interrupt handl Introduce sched_barrier(9), an interface that acts as a scheduler barrier in the sense that it guarantees that the specified CPU went through the scheduler. This also guarantees that interrupt handlers running on that CPU will have finished when sched_barrier() returns. ok miod@, guenther@ show more ...
# 21dab745	14-Mar-2015	jsg <jsg@openbsd.org>	Remove some includes include-what-you-use claims don't have any direct symbols used. Tested for indirect use by compiling amd64/i386/sparc64 kernels. ok tedu@ deraadt@
# 5deb3491	24-Sep-2014	mpi <mpi@openbsd.org>	Keep under #ifdef MULTIPROCESSOR the code that deals with SPCF_SHOULDHALT and SPCF_HALTED, these flags only make sense on secondary CPUs which are unlikely to be present on a SP kernel. ok kettenis@
# 93a6d383	26-Jul-2014	kettenis <kettenis@openbsd.org>	If we're stopping a secondary cpu, don't let sched_choosecpu() short-circuit and return the current current CPU, otherwise sched_stop_secondary_cpus() will spin forever trying to empty its run queues If we're stopping a secondary cpu, don't let sched_choosecpu() short-circuit and return the current current CPU, otherwise sched_stop_secondary_cpus() will spin forever trying to empty its run queues. Fixes hangs during suspend that many people reported over the last couple of days. ok bcook@, guenther@ show more ...
# d3a76966	13-Jul-2014	matthew <matthew@openbsd.org>	Fix sched_stop_secondary_cpus() to properly drain CPUs TAILQ_FOREACH() isn't safe to use in sched_chooseproc() to iterate over the run queues because within the loop body we remove the threads from Fix sched_stop_secondary_cpus() to properly drain CPUs TAILQ_FOREACH() isn't safe to use in sched_chooseproc() to iterate over the run queues because within the loop body we remove the threads from their run queues and reinsert them elsewhere. As a result, we end up only draining the first thread of each run queue rather than all of them. ok kettenis show more ...
# ed557a36	04-May-2014	guenther <guenther@openbsd.org>	Add PS_SYSTEM, the process-level mirror of the thread-level P_SYSTEM, and FORK_SYSTEM as a flag to set them. This eliminates needing to peek into other processes threads in various places. Inspired Add PS_SYSTEM, the process-level mirror of the thread-level P_SYSTEM, and FORK_SYSTEM as a flag to set them. This eliminates needing to peek into other processes threads in various places. Inspired by NetBSD ok miod@ matthew@ show more ...
# 34b8a7e2	12-Feb-2014	guenther <guenther@openbsd.org>	Eliminate the exit sig handling, which was only invokable via the Linux-compat clone() syscall when not using CLONE_THREAD. pirofti@ confirms Opera runs in compat without this, so out it goes; one Eliminate the exit sig handling, which was only invokable via the Linux-compat clone() syscall when not using CLONE_THREAD. pirofti@ confirms Opera runs in compat without this, so out it goes; one less hair to choke on in kern_exit.c ok tedu@ pirofti@ show more ...
# a48ed3dd	06-Jun-2013	haesbaert <haesbaert@openbsd.org>	Prevent idle thread from being stolen on startup. There is a race condition which might trigger a case where two cpus try to run the same idle thread. The problem arises when one cpu steals the idl Prevent idle thread from being stolen on startup. There is a race condition which might trigger a case where two cpus try to run the same idle thread. The problem arises when one cpu steals the idle proc of another cpu and this other cpu ends up running the idle thread via spc->spc_idleproc, resulting in two cpus trying to cpu_switchto(idleX). On startup, idle procs are scaterred around different runqueues, the decision for scheduling is: 1 look at my runqueue. 2 if empty, look at other dudes runqueue. 3 if empty, select idle proc via spc->spc_idleproc. The problem is that cpu0's idle0 might be running on cpu1 due to step 1 or 2 and cpu0 hits step 3. So cpu0 will select idle0, while cpu1 is in fact running it already. The solution is to never place idle on a runqueue, therefore being only selectable through spc->spc_idleproc. This race can be more easily triggered on a HT cpu on virtualized environments, where the guest more often than not doesn't have the cpu for itself, so timing gets shuffled. ok tedu@ guenther@ go ahead after t2k13 deraadt@ show more ...
# 08be1c18	03-Jun-2013	guenther <guenther@openbsd.org>	Convert some internal APIs to use timespecs instead of timevals ok matthew@ deraadt@
1 234 5