History log of /openbsd-src/sys/kern/kern_sched.c (Results 51 – 75 of 103)
Revision Date Author Comments
# 7befbc1b 05-Oct-2018 cheloha <cheloha@openbsd.org>

Revert KERN_CPTIME2 ENODEV changes in kernel and userspace.

ok kettenis deraadt


# 3634178a 26-Sep-2018 cheloha <cheloha@openbsd.org>

KERN_CPTIME2: set ENODEV if the CPU is offline.

This lets userspace distinguish between idle CPUs and those that are
not schedulable because hw.smt=0.

A subsequent commit probably needs to add docu

KERN_CPTIME2: set ENODEV if the CPU is offline.

This lets userspace distinguish between idle CPUs and those that are
not schedulable because hw.smt=0.

A subsequent commit probably needs to add documentation for this
to sysctl.2 (and perhaps elsewhere) after the dust settles.

Also included here are changes to systat(1) and top(1) that account
for the ENODEV case and adjust behavior accordingly:

- systat(1)'s cpu view prints placeholder marks ('-') instead of
percentages for each state if the given CPU is offline.

- systat(1)'s vmstat view checks for offline CPUs when computing the
machine state total and excludes them, so the CPU usage graph
only represents the states for online CPUs.

- top(1) does not draw CPU rows for offline CPUs when the view is
redrawn. If CPUs "go offline", percentages for each state are
replaced by placeholder marks ('-'); the view will need to be
redrawn to remove these rows. If CPUs "go online" the view will
need to be redrawn to show these new CPUs. In "combined CPU" mode,
the count and the state totals only represent online CPUs.

Ports using KERN_CPTIME2 will need to be updated. The changes
described above to make systat(1) and top(1) aware of the ENODEV
case *and* gracefully handle a changing HW_NCPUONLINE while the
application is running are not necessarily appropriate for each
and every port.

The changes described above are so extensive in part to demonstrate
one way a program *might* be made robust to changing CPU availability.
In particular, changing hw.smt after boot is an extremely rare event,
and this needs to be weighed when updating ports.

The logic needed to account for the KERN_CPTIME2 ENODEV case is
very roughly:

if (sysctl(...) == -1) {
if (errno != ENODEV) {
/* Actual error occurred. */
} else {
/* CPU is offline. */
}
} else {
/* CPU is online and CPU states were set by sysctl(2). */
}

Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at
length with kettenis@. Additional testing by tb@.

No complaints from hackers@ after a week.

ok kettenis@, "I think you should commit [now]" deraadt@

show more ...


# c71ddef4 12-Jul-2018 cheloha <cheloha@openbsd.org>

Add hw.ncpuonline to count the number of online CPUs.

The introduction of hw.smt means that logical CPUs can be disabled
after boot and prior to suspend/resume. If hw.smt=0 (the default),
there nee

Add hw.ncpuonline to count the number of online CPUs.

The introduction of hw.smt means that logical CPUs can be disabled
after boot and prior to suspend/resume. If hw.smt=0 (the default),
there needs to be a way to count the number of hardware threads
available on the system at any given time.

So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it.
hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu()
during boot, while hw.ncpuonline is equal to the number of CPUs available
to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN
equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu.

This is preferable to adding a new sysctl to count the number of
configured CPUs and keeping hw.ncpu equal to the number of online
CPUs because such a change would break software in the ecosystem
that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like.
Such software in base includes top(1), systat(1), and snmpd(8),
and perhaps others.

We don't need additional locking to count the cardinality of a cpuset
in this case because the only interfaces that can modify said cardinality
are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK.

Software using HW_NCPU/hw.ncpu to determine optimal parallism will need
to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software
may perform suboptimally. However, most changes will be similar to the
change included here for libcxx's std::thread:hardware_concurrency():
using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining
optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN
is insufficient.

Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen.
Lots of patch tweaks from kettenis.

ok kettenis, "proceed" deraadt

show more ...


# 3183901a 07-Jul-2018 visa <visa@openbsd.org>

Release the kernel lock fully on thread exit. This prevents a locking
error that would happen otherwise when a traced and stopped
multithreaded process is forced to exit. The error shows up as a kern

Release the kernel lock fully on thread exit. This prevents a locking
error that would happen otherwise when a traced and stopped
multithreaded process is forced to exit. The error shows up as a kernel
panic when WITNESS is enabled. Without WITNESS, the error causes
a system hang.

sched_exit() has expected that a single KERNEL_UNLOCK() would release
the lock completely. That assumption is wrong when an exit happens
through the signal tracing logic:

sched_exit
exit1
single_thread_check
single_thread_set
issignal <-- KERNEL_LOCK()
userret <-- KERNEL_LOCK()
syscall

The error is a regression of r1.216 of kern_sig.c.

Panic reported and fix tested by Laurence Tratt
OK mpi@

show more ...


# 73dc9ed4 30-Jun-2018 kettenis <kettenis@openbsd.org>

Don't steal processes from other CPUs if we're not scheduling processes on
a CPU.

ok deraadt@


# 96c11352 19-Jun-2018 kettenis <kettenis@openbsd.org>

SMT (Simultanious Multi Threading) implementations typically share
TLBs and L1 caches between threads. This can make cache timing
attacks a lot easier and we strongly suspect that this will make
sev

SMT (Simultanious Multi Threading) implementations typically share
TLBs and L1 caches between threads. This can make cache timing
attacks a lot easier and we strongly suspect that this will make
several spectre-class bugs exploitable. Especially on Intel's SMT
implementation which is better known as Hypter-threading. We really
should not run different security domains on different processor
threads of the same core. Unfortunately changing our scheduler to
take this into account is far from trivial. Since many modern
machines no longer provide the ability to disable Hyper-threading in
the BIOS setup, provide a way to disable the use of additional
processor threads in our scheduler. And since we suspect there are
serious risks, we disable them by default. This can be controlled
through a new hw.smt sysctl. For now this only works on Intel CPUs
when running OpenBSD/amd64. But we're planning to extend this feature
to CPUs from other vendors and other hardware architectures.

Note that SMT doesn't necessarily have a posive effect on performance;
it highly depends on the workload. In all likelyhood it will actually
slow down most workloads if you have a CPU with more than two cores.

ok deraadt@

show more ...


# 0e4e5752 14-Dec-2017 dlg <dlg@openbsd.org>

make sched_barrier use cond_wait/cond_signal.

previously the code was using a percpu flag to manage the sleeps/wakeups,
which means multiple threads waiting for a barrier on a cpu could
race. moving

make sched_barrier use cond_wait/cond_signal.

previously the code was using a percpu flag to manage the sleeps/wakeups,
which means multiple threads waiting for a barrier on a cpu could
race. moving to a cond struct on the stack fixes this.

while here, get rid of the sbar taskq and just use systqmp instead.
the barrier tasks are short, so there's no real downside.

ok mpi@

show more ...


# 64a91b3e 28-Nov-2017 visa <visa@openbsd.org>

Raise the IPL of the sbar taskq to avoid lock order issues
with the kernel lock.

Fixes a deadlock seen by Hrvoje Popovski and dhill@.
OK mpi@, dhill@


# 79a514fc 12-Feb-2017 guenther <guenther@openbsd.org>

Split up fork1():
- FORK_THREAD handling is a totally separate function, thread_fork(),
that is only used by sys___tfork() and which loses the flags, func,
arg, and newprocp parameters and gai

Split up fork1():
- FORK_THREAD handling is a totally separate function, thread_fork(),
that is only used by sys___tfork() and which loses the flags, func,
arg, and newprocp parameters and gains tcb parameter to guarantee
the new thread's TCB is set before the creating thread returns
- fork1() loses its stack and tidptr parameters
Common bits factor out:
- struct proc allocation and initialization moves to thread_new()
- maxthread handling moves to fork_check_maxthread()
- setting the new thread running moves to fork_thread_start()
The MD cpu_fork() function swaps its unused stacksize parameter for
a tcb parameter.

luna88k testing by aoyama@, alpha testing by dlg@
ok mpi@

show more ...


# 8fda72b7 21-Jan-2017 guenther <guenther@openbsd.org>

p_comm is the process's command and isn't per thread, so move it from
struct proc to struct process.

ok deraadt@ kettenis@


# 336ad790 03-Jun-2016 kettenis <kettenis@openbsd.org>

Allow pegged process on secondary CPUs to continue to be scheduled when
halting a CPU. Necessary for intr_barrier(9) to work when interrupts are
targeted at secondary CPUs.

ok mpi@, mikeb@ (a while

Allow pegged process on secondary CPUs to continue to be scheduled when
halting a CPU. Necessary for intr_barrier(9) to work when interrupts are
targeted at secondary CPUs.

ok mpi@, mikeb@ (a while back)

show more ...


# b2602131 17-Mar-2016 mpi <mpi@openbsd.org>

Replace curcpu_is_idle() by cpu_is_idle() and use it instead of rolling
our own.

From Michal Mazurek, ok mmcc@


# 6dc79ad1 23-Dec-2015 kettenis <kettenis@openbsd.org>

One "sbar" taskq is enough.

ok visa@


# 581334fc 17-Dec-2015 kettenis <kettenis@openbsd.org>

Make the cost of moving a process to the primary cpu a bit higher. This is
the CPU that handles most hardware interrupts but we don't account for that
in any way in the scheduler. So processes (and

Make the cost of moving a process to the primary cpu a bit higher. This is
the CPU that handles most hardware interrupts but we don't account for that
in any way in the scheduler. So processes (and kernel threads) that are
unlucky enough to end up on this CPU will get less CPU cycles than those
running on other CPUs. This is especially true for the softnet taskq.
There network interrupts will prevent the softnet taskq from running. This
means that the more packets we receive, the less packets we can actually
process and/or forward. This is why "unlocking" network drivers actually
decreases the forwarding performance. This diff restores most of the lost
performance by making it less likely that the softnet taskq ends up on the
same CPU that handles network interrupts.

Tested by Hrvoje Popovski

ok mpi@, deraadt@

show more ...


# fc08c356 16-Oct-2015 mpi <mpi@openbsd.org>

Make sched_barrier() use its own task queue to avoid deadlocks.

Prevent a deadlock from occuring when intr_barrier() is called from
a non-primary CPU in the watchdog task, also enqueued on ``systq''

Make sched_barrier() use its own task queue to avoid deadlocks.

Prevent a deadlock from occuring when intr_barrier() is called from
a non-primary CPU in the watchdog task, also enqueued on ``systq''.

ok kettenis@

show more ...


# 077ff916 20-Sep-2015 kettenis <kettenis@openbsd.org>

Short circuit if we're running on the CPU that we want to sync with. Fixes
suspend on machines with em(4) now that it uses intr_barrier(9).

ok krw@


# f30f8d91 13-Sep-2015 kettenis <kettenis@openbsd.org>

Introduce sched_barrier(9), an interface that acts as a scheduler barrier in
the sense that it guarantees that the specified CPU went through the
scheduler. This also guarantees that interrupt handl

Introduce sched_barrier(9), an interface that acts as a scheduler barrier in
the sense that it guarantees that the specified CPU went through the
scheduler. This also guarantees that interrupt handlers running on that CPU
will have finished when sched_barrier() returns.

ok miod@, guenther@

show more ...


# 21dab745 14-Mar-2015 jsg <jsg@openbsd.org>

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


# 5deb3491 24-Sep-2014 mpi <mpi@openbsd.org>

Keep under #ifdef MULTIPROCESSOR the code that deals with SPCF_SHOULDHALT
and SPCF_HALTED, these flags only make sense on secondary CPUs which are
unlikely to be present on a SP kernel.

ok kettenis@


# 93a6d383 26-Jul-2014 kettenis <kettenis@openbsd.org>

If we're stopping a secondary cpu, don't let sched_choosecpu() short-circuit
and return the current current CPU, otherwise sched_stop_secondary_cpus()
will spin forever trying to empty its run queues

If we're stopping a secondary cpu, don't let sched_choosecpu() short-circuit
and return the current current CPU, otherwise sched_stop_secondary_cpus()
will spin forever trying to empty its run queues. Fixes hangs during suspend
that many people reported over the last couple of days.

ok bcook@, guenther@

show more ...


# d3a76966 13-Jul-2014 matthew <matthew@openbsd.org>

Fix sched_stop_secondary_cpus() to properly drain CPUs

TAILQ_FOREACH() isn't safe to use in sched_chooseproc() to iterate
over the run queues because within the loop body we remove the threads
from

Fix sched_stop_secondary_cpus() to properly drain CPUs

TAILQ_FOREACH() isn't safe to use in sched_chooseproc() to iterate
over the run queues because within the loop body we remove the threads
from their run queues and reinsert them elsewhere. As a result, we
end up only draining the first thread of each run queue rather than
all of them.

ok kettenis

show more ...


# ed557a36 04-May-2014 guenther <guenther@openbsd.org>

Add PS_SYSTEM, the process-level mirror of the thread-level P_SYSTEM,
and FORK_SYSTEM as a flag to set them. This eliminates needing to
peek into other processes threads in various places. Inspired

Add PS_SYSTEM, the process-level mirror of the thread-level P_SYSTEM,
and FORK_SYSTEM as a flag to set them. This eliminates needing to
peek into other processes threads in various places. Inspired by NetBSD

ok miod@ matthew@

show more ...


# 34b8a7e2 12-Feb-2014 guenther <guenther@openbsd.org>

Eliminate the exit sig handling, which was only invokable via the
Linux-compat clone() syscall when *not* using CLONE_THREAD. pirofti@
confirms Opera runs in compat without this, so out it goes; one

Eliminate the exit sig handling, which was only invokable via the
Linux-compat clone() syscall when *not* using CLONE_THREAD. pirofti@
confirms Opera runs in compat without this, so out it goes; one less hair
to choke on in kern_exit.c

ok tedu@ pirofti@

show more ...


# a48ed3dd 06-Jun-2013 haesbaert <haesbaert@openbsd.org>

Prevent idle thread from being stolen on startup.

There is a race condition which might trigger a case where two cpus try
to run the same idle thread.

The problem arises when one cpu steals the idl

Prevent idle thread from being stolen on startup.

There is a race condition which might trigger a case where two cpus try
to run the same idle thread.

The problem arises when one cpu steals the idle proc of another cpu and
this other cpu ends up running the idle thread via spc->spc_idleproc,
resulting in two cpus trying to cpu_switchto(idleX).

On startup, idle procs are scaterred around different runqueues, the
decision for scheduling is:

1 look at my runqueue.
2 if empty, look at other dudes runqueue.
3 if empty, select idle proc via spc->spc_idleproc.

The problem is that cpu0's idle0 might be running on cpu1 due to step 1
or 2 and cpu0 hits step 3.

So cpu0 will select idle0, while cpu1 is in fact running it already.

The solution is to never place idle on a runqueue, therefore being
only selectable through spc->spc_idleproc.

This race can be more easily triggered on a HT cpu on virtualized
environments, where the guest more often than not doesn't have the cpu
for itself, so timing gets shuffled.

ok tedu@ guenther@
go ahead after t2k13 deraadt@

show more ...


# 08be1c18 03-Jun-2013 guenther <guenther@openbsd.org>

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


12345