| #
32a89764 |
| 08-Oct-2023 |
ad <ad@NetBSD.org> |
Ensure that an LWP that has taken a legitimate wakeup never produces an error code from sleepq_block(). Then, it's possible to make cv_signal() work as expected and only ever wake a singular LWP.
|
| #
0a6ca13b |
| 04-Oct-2023 |
ad <ad@NetBSD.org> |
Eliminate l->l_biglocks. Originally I think it had a use but these days a local variable will do.
|
| #
6ed72b5f |
| 23-Sep-2023 |
ad <ad@NetBSD.org> |
- Simplify how priority boost for blocking in kernel is handled. Rather than setting it up at each site where we block, make it a property of syncobj_t. Then, do not hang onto the priority boos
- Simplify how priority boost for blocking in kernel is handled. Rather than setting it up at each site where we block, make it a property of syncobj_t. Then, do not hang onto the priority boost until userret(), drop it as soon as the LWP is out of the run queue and onto a CPU. Holding onto it longer is of questionable benefit.
- This allows two members of lwp_t to be deleted, and mi_userret() to be simplified a lot (next step: trim it down to a single conditional).
- While here, constify syncobj_t and de-inline a bunch of small functions like lwp_lock() which turn out not to be small after all (I don't know why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and beyond what volatile does).
show more ...
|
| #
1f610b7f |
| 27-Jun-2023 |
pho <pho@NetBSD.org> |
callout(9): Delete the unused member cc_cancel from struct callout_cpu
I see no reason why it should be there, and believe its a leftover from some old code.
|
| #
964811d3 |
| 27-Jun-2023 |
pho <pho@NetBSD.org> |
callout(9): Tidy up the condition for "callout is running on another LWP"
No functional changes.
|
| #
eef95bfb |
| 27-Jun-2023 |
pho <pho@NetBSD.org> |
callout(9): Fix panic() in callout_destroy() (kern/57226)
The culprit was callout_halt(). "(c->c_flags & CALLOUT_FIRED) != 0" wasn't the correct way to check if a callout is running. It failed to wa
callout(9): Fix panic() in callout_destroy() (kern/57226)
The culprit was callout_halt(). "(c->c_flags & CALLOUT_FIRED) != 0" wasn't the correct way to check if a callout is running. It failed to wait for a running callout to finish in the following scenario:
1. cpu0 initializes a callout and schedules it. 2. cpu0 invokes callout_softlock() and fires the callout, setting the flag CALLOUT_FIRED. 3. The callout invokes callout_schedule() to re-schedule itself. 4. callout_schedule_locked() clears the flag CALLOUT_FIRED, and releases the lock. 5. Before the lock is re-acquired by callout_softlock(), cpu1 decides to destroy the callout. It first invokes callout_halt() to make sure the callout finishes running. 6. But since CALLOUT_FIRED has been cleared, callout_halt() thinks it's not running and therefore returns without invoking callout_wait(). 7. cpu1 proceeds to invoke callout_destroy() while it's still running on cpu0. callout_destroy() detects that and panics.
show more ...
|
| #
68020631 |
| 29-Oct-2022 |
riastradh <riastradh@NetBSD.org> |
callout(9): Mark new flags local unused for non-KDTRACE_HOOKS builds.
(feel free to add a new __dtrace_used annotation to make this more precise)
|
| #
fb5b3d05 |
| 28-Oct-2022 |
riastradh <riastradh@NetBSD.org> |
callout(9): Sprinkle dtrace probes.
|
| #
b756b844 |
| 28-Oct-2022 |
riastradh <riastradh@NetBSD.org> |
callout(9): Nix trailing whitespace.
No functional change intended.
|
| #
7baa9e8e |
| 29-Jun-2022 |
riastradh <riastradh@NetBSD.org> |
sleepq(9): Pass syncobj through to sleepq_block.
Previously the usage pattern was:
sleepq_enter(sq, l, lock); // locks l ... sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked
sleepq(9): Pass syncobj through to sleepq_block.
Previously the usage pattern was:
sleepq_enter(sq, l, lock); // locks l ... sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj ... (*) sleepq_block(...); // unlocks l
As long as l remains locked from sleepq_enter to sleepq_block, l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine whether the sleep is on a mutex in order to avoid creating ktrace context-switch records (which involves allocation which is forbidden in softint context, while taking and even sleeping for a mutex is allowed).
However, in turnstile_block, the logic at (*) also involves turnstile_lendpri, which sometimes unlocks and relocks l. At that point, another thread can swoop in and sleepq_remove l, which sets l_syncobj to sched_syncobj. If that happens, ktrcsw does what is forbidden -- tries to allocate a ktrace record for the context switch.
As an optimization, sleepq_block or turnstile_block could stop early if it detects that l_syncobj doesn't match -- we've already been requested to wake up at this point so there's no need to mi_switch. (And then it would be unnecessary to pass the syncobj through sleepq_block, because l_syncobj would remain stable.) But I'll leave that to another change.
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
show more ...
|
| #
fb698d22 |
| 30-Mar-2022 |
riastradh <riastradh@NetBSD.org> |
kern: Assert softint does not net acquire kernel locks.
This redoes previous change where I mistakenly used the CPU's biglock count, which is not necessarily stable -- the softint lwp may sleep on a
kern: Assert softint does not net acquire kernel locks.
This redoes previous change where I mistakenly used the CPU's biglock count, which is not necessarily stable -- the softint lwp may sleep on a mutex, and the lwp it interrupted may start up again and release the kernel lock, so by the time the softint lwp wakes up again and the softint function returns, the CPU may not be holding any kernel locks. But the softint lwp should never hold kernel locks except when it's in a (usually, non-MPSAFE) softint function.
Same with callout.
show more ...
|
| #
0fa3d1b3 |
| 30-Mar-2022 |
riastradh <riastradh@NetBSD.org> |
Revert "kern: Sprinkle biglock-slippage assertions."
Got the diagnostic information I needed from this, and it's holding up releng tests of everything else, so let's back this out until I need more
Revert "kern: Sprinkle biglock-slippage assertions."
Got the diagnostic information I needed from this, and it's holding up releng tests of everything else, so let's back this out until I need more diagnostics or track down the original source of the problem.
show more ...
|
| #
e93349be |
| 30-Mar-2022 |
riastradh <riastradh@NetBSD.org> |
kern: Sprinkle biglock-slippage assertions.
We seem to have a poltergeist that occasionally messes with the biglock depth, but it's very hard to reproduce and only manifests as some other CPU spinni
kern: Sprinkle biglock-slippage assertions.
We seem to have a poltergeist that occasionally messes with the biglock depth, but it's very hard to reproduce and only manifests as some other CPU spinning out on the kernel lock which is no good for diagnostics.
show more ...
|
| #
d82fa5bd |
| 27-Jun-2020 |
rin <rin@NetBSD.org> |
Stop allocating struct cpu_info in BSS; No need to db_read_bytes() against cpu_info, just ci_data.cpu_callout is enough.
Save 1408 bytes of BSS for, e.g., aarch64.
|
| #
5125993e |
| 02-Jun-2020 |
rin <rin@NetBSD.org> |
Appease clang -Wtentative-definition-incomplete-type.
Now, both kernel and crash(8) build with clang for amd64 (and certainly other ports also).
Pointed out by joerg.
|
| #
fbbf6b42 |
| 31-May-2020 |
rin <rin@NetBSD.org> |
Stop allocating buffers dynamically in a DDB session, in order not to disturb on-going debugged state of kernel datastructures.
Since DDB is running on 1 CPU at a time, static buffers are enough.
I
Stop allocating buffers dynamically in a DDB session, in order not to disturb on-going debugged state of kernel datastructures.
Since DDB is running on 1 CPU at a time, static buffers are enough.
Increase in BSS section is: 52552 for amd64 (LP64) 9152 for m68k (ILP32)
Requested by thorpej@ and mrg@. Also suggested by ryo@. Thanks!
show more ...
|
| #
a97ea118 |
| 31-May-2020 |
rin <rin@NetBSD.org> |
Switch to db_alloc() from kmem_intr_alloc(9).
Fix build failure as a part of crash(8). Noticed by tnn@, thanks!
|
| #
851ed851 |
| 31-May-2020 |
rin <rin@NetBSD.org> |
db_show_callout(): struct callout_cpu and cpu_info are too much for stack.
XXX DDB can be running in the interrupt context, e.g., when activated from console. Therefore, use kmem_intr_alloc(9) inste
db_show_callout(): struct callout_cpu and cpu_info are too much for stack.
XXX DDB can be running in the interrupt context, e.g., when activated from console. Therefore, use kmem_intr_alloc(9) instead of kmem_alloc(9).
Frame size, e.g. for m68k, becomes: 9212 (oops!) --> 0
show more ...
|
| #
46a9878a |
| 19-Apr-2020 |
ad <ad@NetBSD.org> |
Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable waits with turnstiles (not currently done).
|
| #
983fd9cc |
| 13-Apr-2020 |
maxv <maxv@NetBSD.org> |
hardclock_ticks -> getticks()
|
| #
6ec02221 |
| 21-Mar-2020 |
ad <ad@NetBSD.org> |
callout_destroy(): change output from a couple of assertions so it's clear what they are checking for (callout being destroyed while pending/running).
|
| #
5f882f34 |
| 23-Jan-2020 |
ad <ad@NetBSD.org> |
callout_halt():
- It's a common design pattern for callouts to re-schedule themselves, so check after waiting and put a stop to it again if needed. - Add comments.
|
| #
298a9247 |
| 21-Nov-2019 |
ad <ad@NetBSD.org> |
Break the slow path for callout_halt() out into its own routine. No functional change.
|
| #
dd8f5519 |
| 10-Mar-2019 |
kre <kre@NetBSD.org> |
Undo previous, in the name of "defined" behaviour, it breaks things.
This is all explained in the comment at the head of the file:
* Some of the "math" in here is a bit tricky. We have to beware
Undo previous, in the name of "defined" behaviour, it breaks things.
This is all explained in the comment at the head of the file:
* Some of the "math" in here is a bit tricky. We have to beware of * wrapping ints. * * [...] but c->c_time can * be positive or negative so comparing it with anything is dangerous.
In particular, "if (c->c_time > ticks)" is simply wrong.
* The only way we can use the c->c_time value in any predictable way is * when we calculate how far in the future `to' will timeout - "c->c_time * - c->c_cpu->cc_ticks". The result will always be positive for future * timeouts and 0 or negative for due timeouts.
Go back to the old way. But write the calculation of delta slightly differently which will hopefully appease KUBsan. Perhaps. In any case, this code works on any system that NetBSD has any hope of ever running on, whatever the C standards say is "defined" behaviour.
show more ...
|
| #
bed39f81 |
| 08-Jul-2018 |
kamil <kamil@NetBSD.org> |
Try to avoid signed integer overflow in callout_softclock()
The delta operation (c->c_time - ticks) is documented as safe, however it still can cause overflow in narrow case scenarios.
Try to avoid
Try to avoid signed integer overflow in callout_softclock()
The delta operation (c->c_time - ticks) is documented as safe, however it still can cause overflow in narrow case scenarios.
Try to avoid overflow/underflow or at least make it less frequent with a direct comparison of c->c_time and tics. Perform the operation of subtraction only when c->c_time > ticks.
sys/kern/kern_timeout.c:720:9, signed integer overflow: -2147410738 - 72912 cannot be represented in type 'int'
Detected with Kernel Undefined Behavior Sanitizer.
Patch suggested by <Riastradh>
show more ...
|