History log of /dflybsd-src/sys/kern/subr_param.c (Results 1 – 25 of 39)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# b272101a 30-Oct-2023 Aaron LI <aly@aaronly.me>

Various minor whitespace cleanups

Accumulated along the way.


# 09195ea1 06-Nov-2023 Aaron LI <aly@aaronly.me>

mbuf(9): Remove obsolete and unused 'kern.ipc.mbuf_wait' sysctl

This sysctl MIB has been obsolete and unused since the re-implementation
of mbuf allocation using objcache(9) in commit 7b6f875 (year

mbuf(9): Remove obsolete and unused 'kern.ipc.mbuf_wait' sysctl

This sysctl MIB has been obsolete and unused since the re-implementation
of mbuf allocation using objcache(9) in commit 7b6f875 (year 2005).
Remove this sysctl MIB.

Update the mbuf.9 manpage about the 'how' argument to avoid ambiguity,
i.e., MGET()/m_get() etc. would not fail if how=M_WAITOK.

show more ...


Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.3.0, v6.0.1
# 45dd33f2 01-Jul-2021 Aaron LI <aly@aaronly.me>

kernel: Detect NVMM hypervisor

Now sysctl 'kern.vmm_guest' would report 'nvmm' instead of 'unknown'
when running in NVMM hypervisor.

Suggested-by: swildner


Revision tags: v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1
# 880fb308 14-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Offset the stathz systimer by 50% of the hz timer

* Offset the initial starting point of the stathz systimer by
50% of the hz timer, so they do not interfere with each other
if they hap

kernel - Offset the stathz systimer by 50% of the hz timer

* Offset the initial starting point of the stathz systimer by
50% of the hz timer, so they do not interfere with each other
if they happen to be set to the same frequency.

* Change the default stathz frequency to hz + 1 (101hz) so it
slides across the tick interval window.

show more ...


Revision tags: v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3
# 4837705e 03-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

pthreads and kernel - change MAP_STACK operation

* Only allow new mmap()'s to intrude on ungrown MAP_STACK areas when
MAP_TRYFIXED is specified. This was not working as intended before.
Adjust

pthreads and kernel - change MAP_STACK operation

* Only allow new mmap()'s to intrude on ungrown MAP_STACK areas when
MAP_TRYFIXED is specified. This was not working as intended before.
Adjust the manual page to be more clear.

* Make kern.maxssiz (the maximum user stack size) visible via sysctl.

* Add kern.maxthrssiz, indicating the approximate space for placement
of pthread stacks. This defaults to 128GB.

* The old libthread_xu stack code did use TRYFIXED and will work
with the kernel changes, but change how it works to not assume
that the user stack should suddenly be limited to the pthread stack
size (~2MB).

Just use a normal mmap() now without TRYFIXED and a hint based on
kern.maxthrssiz (defaults to 512GB), calculating a starting address
hint that is (_usrstack - maxssiz - maxthrssiz).

* Adjust procfs to report MAP_STACK segments as 'STK'.

show more ...


Revision tags: v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1
# 6a8bb22d 05-May-2018 Sascha Wildner <saw@online.de>

Fix a few typos across the tree.


Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1
# 9c79791a 14-Aug-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Increase flock() / posix-lock limit

* Change the scaling for kern.maxposixlocksperuid to be based on maxproc
instead of maxusers.

* Increase the default cap to approximately 4 * maxproc.

kernel - Increase flock() / posix-lock limit

* Change the scaling for kern.maxposixlocksperuid to be based on maxproc
instead of maxusers.

* Increase the default cap to approximately 4 * maxproc. This can lead
to what may seem to be large numbers, but certain applications might use
posix locks heavily and overhead is low. We generally want the cap to
be large enough that it never has to be overridden.

show more ...


# bb471226 12-Aug-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Change maxproc cap calculation

* Increase the calculation for the maxproc cap based on physical ram.
This allows a machine with 128GB of ram to maxproc past a million,
though it should

kernel - Change maxproc cap calculation

* Increase the calculation for the maxproc cap based on physical ram.
This allows a machine with 128GB of ram to maxproc past a million,
though it should be noted that PIDs are only 6-digits, so for now
a million processes is the actual limit.

show more ...


# e6b81333 12-Aug-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix bottlenecks that develop when many processes are running

* When a large number of processes or threads are running (in the tens of
thousands or more), a number of O(n) or O(ncpus) bot

kernel - Fix bottlenecks that develop when many processes are running

* When a large number of processes or threads are running (in the tens of
thousands or more), a number of O(n) or O(ncpus) bottlenecks can develop.
These bottlenecks do not develop when only a few thousand threads
are present.

By fixing these bottlenecks, and assuming kern.maxproc is autoconfigured
or manually set high enough, DFly can now handle hundreds of thousands
of active processes running, polling, sleeping, whatever.

Tested to around 400,000 discrete processes (no shared VM pages) on
a 32-thread dual-socket Xeon system. Each process is placed in a
1/10 second sleep loop using umtx timeouts:

baseline - (before changes), system bottlenecked starting
at around the 30,000 process mark, eating all
available cpu, high IPI rate from hash
collisions, and other unrelated user processes
bogged down due to the scheduling overhead.

200,000 processes - System settles down to 45% idle, and low IPI
rate.

220,000 processes - System 30% idle and low IPI rate

250,000 processes - System 0% idle and low IPI rate

300,000 processes - System 0% idle and low IPI rate.

400,000 processes - Scheduler begins to bottleneck again after the
350,000 while the process test is still in its
fork/exec loop.

Once all 400,000 processes are settled down,
system behaves fairly well. 0% idle, modest
IPI rate averaging 300 IPI/sec/cpu (due to
hash collisions in the wakeup code).

* More work will be needed to better handle processes with massively
shared VM pages.

It should also be noted that the system does a *VERY* good job
allocating and releasing kernel resources during this test using
discrete processes. It can kill 400,000 processes in a few seconds
when I ^C the test.

* Change lwkt_enqueue()'s linear td_runq scan into a double-ended scan.
This bottleneck does not arise when large numbers of processes are
running in usermode, because typically only one user process per cpu
will be scheduled to LWKT.

However, this bottleneck does arise when large numbers of threads
are woken up in-kernel. While in-kernel, a thread schedules directly
to LWKT. Round-robin operation tends to result in appends to the tail
of the queue, so this optimization saves an enormous amount of cpu
time when large numbers of threads are present.

* Limit ncallout to ~5 minutes worth of ring. The calculation code is
primarily designed to allocate less space on low-memory machines,
but will also cause an excessively-sized ring to be allocated on
large-memory machines. 512MB was observed on a 32-way box.

* Remove vm_map->hint, which had basically stopped functioning in a
useful manner. Add a new vm_map hinting mechanism that caches up to
four (size, align) start addresses for vm_map_findspace(). This cache
is used to quickly index into the linear vm_map_entry list before
entering the linear search phase.

This fixes a serious bottleneck that arises due to vm_map_findspace()'s
linear scan if the vm_map_entry list when the kernel_map becomes
fragmented, typically when the machine is managing a large number of
processes or threads (in the tens of thousands or more).

This will also reduce overheads for processes with highly fragmented
vm_maps.

* Dynamically size the action_hash[] array in vm/vm_page.c. This array
is used to record blocked umtx operations. The limited size of the
array could result in an excessive number of hash entries when a large
number of processes/threads are present in the system. Again, the
effect is noticed as the number of threads exceeds a few tens of
thousands.

show more ...


Revision tags: v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 6de4cf8a 23-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

vkernel - change hz default, optimize systimer

* Change the hz default to 50

* Refactor the vkernel's systimer code to reduce unnecessary signaling.

* Cleanup kern_clock.c a bit, including renamin

vkernel - change hz default, optimize systimer

* Change the hz default to 50

* Refactor the vkernel's systimer code to reduce unnecessary signaling.

* Cleanup kern_clock.c a bit, including renaming HZ to HZ_DEFAULT to avoid
confusion.

show more ...


# 534ee349 28-Dec-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implement RLIMIT_RSS, Increase maximum supported swap

* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the

kernel - Implement RLIMIT_RSS, Increase maximum supported swap

* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the pages
is fairly unsophisticated (we don't have the luxury of a per-process
vm_page_queues[] array).

* Implement the swap_user_async sysctl, default off. This sysctl can be
set to 1 to enable asynchronous paging in the RSS code. This is mostly
for testing and is not recommended since it allows the process to eat
memory more quickly than it can be paged out.

* Reimplement vm.swap_burst_read so the sysctl now specifies the number
of pages that are allowed to be burst. Still disabled by default (will
be enabled in a followup commit).

* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.

* Refactor some of the pageout code to support synchronous direct
paging, which the RSS code uses. Thew new code also implements a
feature that will move clean pages to PQ_CACHE, making them immediately
reallocatable.

* Refactor the vm_pageout_deficit variable, using atomic ops.

* Fix an issue in vm_pageout_clean() (originally part of the inactive scan)
which prevented clustering from operating properly on write.

* Refactor kern/subr_blist.c and all associated code that uses to increase
swblk_t from int32_t to int64_t, and to increase the radix supported from
31 bits to 63 bits.

This increases the maximum supported swap from 2TB to some ungodly large
value. Remember that, by default, space for up to 4 swap devices
is preallocated so if you are allocating insane amounts of swap it is
best to do it with four equal-sized partitions instead of one so kernel
memory is efficiently allocated.

* There are two kernel data structures associated with swap. The blmeta
structure which has approximately a 1:8192 ratio (ram:swap) and is
pre-allocated up-front, and the swmeta structure whos KVA is reserved
but not allocated.

The swmeta structure has a 1:341 ratio. It tracks swap assignments for
pages in vm_object's. The kernel limits the number of structures to
approximately half of physical memory, meaning that if you have a machine
with 16GB of ram the maximum amount of swapped-out data you can support
with that is 16/2*341 = 2.7TB. Not that you would actually want to eat
half your ram to do actually do that.

A large system with, say, 128GB of ram, would be able to support
128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM.
The swap system can use up to 256GB of this so the maximum swap currently
supported by DragonFly on a machine with > 512GB of ram is going to be
256/2*341 = 43TB. To expand this further would require some adjustments
to increase the amount of KVM supported by the kernel.

* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory
can be reused for swmeta but cannot be freed for use by other subsystems.
You should only configure as much swap as you are willing to reserve ram
for.

show more ...


Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0
# 2f0acc22 17-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve physio performance

* See http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt

* Hash the pbuf system. This chops down spin-lock collisions
at high transaction rates (>150K IOPS)

kernel - Improve physio performance

* See http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt

* Hash the pbuf system. This chops down spin-lock collisions
at high transaction rates (>150K IOPS) by 1000x.

* Implement a pbuf with pre-allocated kernel memory that we
copy into, avoiding page table manipulations and thus
avoiding system-wide invltlb/invlpg IPIs.

* This increases NVMe IOPS tests with three cards from
150K-200K IOPS to 950K IOPS using physio (random read,
4K blocks, from urandom-filled partition, with many
process threads, from 3 NVMe cards in parallel).

* Further adjustments to the vkernel build.

show more ...


# d26086c2 12-Jun-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

kern: Update virtual machine detection a bit

Obtained-from: FreeBSD (partial)


Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2
# cb963641 05-Mar-2014 Sascha Wildner <saw@online.de>

kernel: Adjust type of vmm_guest to enum vmm_guest_type.

Also rename VMM_LAST -> VMM_GUEST_LAST.


# 5a1223c1 04-Mar-2014 Sascha Wildner <saw@online.de>

kernel: Add more detailed VM detection.

Previously, the kernel global 'vmm_guest' was either 0 or 1 depending
on the VMM bit in the CPU features (which isn't set in all VMs).

This commit adds more

kernel: Add more detailed VM detection.

Previously, the kernel global 'vmm_guest' was either 0 or 1 depending
on the VMM bit in the CPU features (which isn't set in all VMs).

This commit adds more detailed information by checking the emulated
BIOS for known strings. The detected VMs include vkernel, which
doesn't strictly fit into the category, but it shares enough
similarities for this to be useful.

Also expose this information in a read-only sysctl (kern.vmm_guest).

The detection code was kind of adapted from FreeBSD (although their
kern.vm_guest works differently).

Tested-by: tuxillo

show more ...


Revision tags: v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.4.3
# dc71b7ab 31-May-2013 Justin C. Sherrill <justin@shiningsilence.com>

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Apparently everyone's doing it:
http://svnweb.freebsd.org/base?view=revision&revision=251069

Submitted-by: "Eitan Adler" <lists at eitanadl

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Apparently everyone's doing it:
http://svnweb.freebsd.org/base?view=revision&revision=251069

Submitted-by: "Eitan Adler" <lists at eitanadler.com>

show more ...


Revision tags: v3.4.2
# 2702099d 06-May-2013 Justin C. Sherrill <justin@shiningsilence.com>

Remove advertising clause from all that isn't contrib or userland bin.

By: Eitan Adler <lists@eitanadler.com>


Revision tags: v3.4.1, v3.4.0, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0
# a0264b24 16-Sep-2012 Matthew Dillon <dillon@apollo.backplane.com>

Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2


# 74d62460 15-Sep-2012 Matthew Dillon <dillon@apollo.backplane.com>

kernel - remove bounds on buffer cache nbuf count for 64-bit

* Remove arbitrary 1GB buffer cache limitation

* Adjusted numerous 'int' fields to 'long'. Even though nbuf is not
likely to exceed 2

kernel - remove bounds on buffer cache nbuf count for 64-bit

* Remove arbitrary 1GB buffer cache limitation

* Adjusted numerous 'int' fields to 'long'. Even though nbuf is not
likely to exceed 2 billion buffers, byte calculations using the
variable began overflowing so just convert that and various other
variables to long.

* Make sure we don't blow-out the temporary valloc() space in early boot
due to nbufs being too large.

* Unbound 'kern.nbuf' specifications in /boot/loader.conf as well.

show more ...


Revision tags: v3.0.3
# 9437e5dc 31-May-2012 Matthew Dillon <dillon@apollo.backplane.com>

Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2


# 08771751 24-May-2012 Venkatesh Srinivas <me@endeavour.zapto.org>

kernel -- CLFLUSH support

* Introduce a kernel variable, 'vmm_guest', signifying whether the
kernel is running in a virtual environment, such as KVM. This is
set based on the CPUID2.VMM flag on

kernel -- CLFLUSH support

* Introduce a kernel variable, 'vmm_guest', signifying whether the
kernel is running in a virtual environment, such as KVM. This is
set based on the CPUID2.VMM flag on kernels and set automatically
on virtual kernels.

* Introduce wrappers for CLFLUSH instructions.

* Provide tunable, hw.clflush_enable, to autoenable CLFLUSH on h/w (-1)
disable always (0), or enable always (1).

Closes-bug: 2363
Reviewed-by: ftigeot@
From: David Shao, FreeBSD

show more ...


Revision tags: v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 86d7f5d3 26-Nov-2011 John Marino <draco@marino.st>

Initial import of binutils 2.22 on the new vendor branch

Future versions of binutils will also reside on this branch rather
than continuing to create new binutils branches for each new version.


# 50eff085 02-Nov-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - reformulate the maxusers auto-sizing calculation

* Reformulate the maxusers auto-sizing calculation, which is used as a
basis for mbufs and mbuf cluster calculations. Base the values on

kernel - reformulate the maxusers auto-sizing calculation

* Reformulate the maxusers auto-sizing calculation, which is used as a
basis for mbufs and mbuf cluster calculations. Base the values on
limsize (basically the lower of KVM vs physical memory).

* Remove artificial limits.

* This basically effects x86-64 systems with > 4G of ram, greatly
increasing the default maxusers value and related mbuf limits.

show more ...


Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0
# 79634a66 12-Aug-2009 Matthew Dillon <dillon@apollo.backplane.com>

swap, amd64 - increase maximum swap space to 1TB x 4

* The radix can overflow a 32 bit integer even if swblk_t fits in 32 bits.
Expand the radix to 64 bits and thus allow the subr_blist code to op

swap, amd64 - increase maximum swap space to 1TB x 4

* The radix can overflow a 32 bit integer even if swblk_t fits in 32 bits.
Expand the radix to 64 bits and thus allow the subr_blist code to operate
up to 2 billion blocks (8TB total).

* Shortcut the common single-swap-device case. We do not have to scan
the radix tree to get available space in the single-device case.

* Change maxswzone and maxbcache to longs and add TUNABLE_LONG_FETCH().

* All the TUNEABLE_*_FETCH() calls and kgetenv_*() calls for integers
call kgetenv_quad().

Adjust kgetenv_quad() to accept a suffix for kilobytes, megabytes,
gigabytes, and terrabytes.

show more ...


Revision tags: v2.3.2, v2.3.1, v2.2.1, v2.2.0, v2.3.0, v2.1.1, v2.0.1
# a46fac56 26-Jun-2005 Matthew Dillon <dillon@dragonflybsd.org>

Move more scheduler-specific defines from various places into usched_bsd4.c
and revamp our scheduler algorithms.

* Get rid of the multiple layers of abstraction in the nice and frequency
calculati

Move more scheduler-specific defines from various places into usched_bsd4.c
and revamp our scheduler algorithms.

* Get rid of the multiple layers of abstraction in the nice and frequency
calculations.

* Increase the scheduling freqency from 20hz to 50hz.

* Greatly reduce the number of scheduling ticks that occur before a
reschedule is issued.

* Fix a bug where the scheduler was not rescheduling when estcpu drops
a process into a lower priority queue.

* Implement a new batch detection algorithm. This algorithm gives
forked children slightly more batchness then their parents (which
is recovered quickly if the child is interactive), and propogates
estcpu data from exiting children to future forked children (which
handles fork/exec/wait loops such as used by make, scripts, etc).

* Change the way NICE priorities effect process execution. The NICE
value is used in two ways: First, it determines the initial process
priority. The estcpu will have a tendancy to correct for this so the NICE
value is also used to control estcpu's decay rate.

This means that niced processes will have both an initial penalty for
startup and stabilization, and an ongoing penalty if they become cpu
bound.

Examples from cpu-bound loops:

CPU PRI NI PID %CPU TIME COMMAND
42 159 -20 706 20.5 0:38.88 /bin/csh /tmp/dowhile
37 159 -15 704 17.6 0:35.09 /bin/csh /tmp/dowhile
29 157 -10 702 15.3 0:30.41 /bin/csh /tmp/dowhile
28 160 -5 700 13.0 0:26.73 /bin/csh /tmp/dowhile
23 160 0 698 11.5 0:20.52 /bin/csh /tmp/dowhile
18 160 5 696 9.2 0:16.85 /bin/csh /tmp/dowhile
13 160 10 694 7.1 0:10.82 /bin/csh /tmp/dowhile
3 160 20 692 1.5 0:02.14 /bin/csh /tmp/dowhile

show more ...


12