1.\" 2.\" Copyright (c) 2006, 2007 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd December 27, 2018 33.Dt VKERNEL 7 34.Os 35.Sh NAME 36.Nm vkernel , 37.Nm vcd , 38.Nm vkd , 39.Nm vke 40.Nd virtual kernel architecture 41.Sh SYNOPSIS 42.Cd "platform vkernel64 # for 64 bit vkernels" 43.Cd "device vcd" 44.Cd "device vkd" 45.Cd "device vke" 46.Pp 47.Pa /var/vkernel/boot/kernel/kernel 48.Op Fl hdstUvz 49.Op Fl c Ar file 50.Op Fl e Ar name Ns = Ns Li value : Ns Ar name Ns = Ns Li value : Ns ... 51.Op Fl i Ar file 52.Op Fl I Ar interface Ns Op Ar :address1 Ns Oo Ar :address2 Oc Ns Oo Ar /netmask Oc Ns Oo Ar =mac Oc 53.Op Fl l Ar cpulock 54.Op Fl m Ar size 55.Op Fl n Ar numcpus Ns Op Ar :lbits Ns Oo Ar :cbits Oc 56.Op Fl p Ar pidfile 57.Op Fl r Ar file Ns Op Ar :serno 58.Op Fl R Ar file Ns Op Ar :serno 59.Sh DESCRIPTION 60The 61.Nm 62architecture allows for running 63.Dx 64kernels in userland. 65.Pp 66The following options are available: 67.Bl -tag -width ".Fl m Ar size" 68.It Fl c Ar file 69Specify a readonly CD-ROM image 70.Ar file 71to be used by the kernel, with the first 72.Fl c 73option defining 74.Li vcd0 , 75the second one 76.Li vcd1 , 77and so on. 78The first 79.Fl r , 80.Fl R , 81or 82.Fl c 83option specified on the command line will be the boot disk. 84The CD9660 filesystem is assumed when booting from this media. 85.It Fl d 86Disables hardware pagetable for 87.Nm . 88.It Fl e Ar name Ns = Ns Li value : Ns Ar name Ns = Ns Li value : Ns ... 89Specify an environment to be used by the kernel. 90This option can be specified more than once. 91.It Fl h 92Shows a list of available options, each with a short description. 93.It Fl i Ar file 94Specify a memory image 95.Ar file 96to be used by the virtual kernel. 97If no 98.Fl i 99option is given, the kernel will generate a name of the form 100.Pa /var/vkernel/memimg.XXXXXX , 101with the trailing 102.Ql X Ns s 103being replaced by a sequential number, e.g.\& 104.Pa memimg.000001 . 105.It Fl I Ar interface Ns Op Ar :address1 Ns Oo Ar :address2 Oc Ns Oo Ar /netmask Oc Ns Oo Ar =MAC Oc 106Create a virtual network device, with the first 107.Fl I 108option defining 109.Li vke0 , 110the second one 111.Li vke1 , 112and so on. 113.Pp 114The 115.Ar interface 116argument is the name of a 117.Xr tap 4 118device node or the path to a 119.Xr vknetd 8 120socket. 121The 122.Pa /dev/ 123path prefix does not have to be specified and will be automatically prepended 124for a device node. 125Specifying 126.Cm auto 127will pick the first unused 128.Xr tap 4 129device. 130.Pp 131The 132.Ar address1 133and 134.Ar address2 135arguments are the IP addresses of the 136.Xr tap 4 137and 138.Nm vke 139interfaces. 140Optionally, 141.Ar address1 142may be of the form 143.Li bridge Ns Em X 144in which case the 145.Xr tap 4 146interface is added to the specified 147.Xr bridge 4 148interface. 149The 150.Nm vke 151address is not assigned until the interface is brought up in the guest. 152.Pp 153The 154.Ar netmask 155argument applies to all interfaces for which an address is specified. 156.Pp 157The 158.Ar MAC 159argument is the MAC address of the 160.Xr vke 4 161interface. 162If not specified, a pseudo-random one will be generated. 163.Pp 164When running multiple vkernels it is often more convenient to simply 165connect to a 166.Xr vknetd 8 167socket and let vknetd deal with the tap and/or bridge. 168An example of this would be 169.Pa /var/run/vknet:0.0.0.0:10.2.0.2/16 . 170.It Fl l Ar cpulock 171Specify which, if any, real CPUs to lock virtual CPUs to. 172.Ar cpulock 173is one of 174.Cm any , 175.Cm map Ns Op Ns , Ns Ar startCPU , 176or 177.Ar CPU . 178.Pp 179.Cm any 180does not map virtual CPUs to real CPUs. 181This is the default. 182.Pp 183.Cm map Ns Op Ns , Ns Ar startCPU 184maps each virtual CPU to a real CPU starting with real CPU 0 or 185.Ar startCPU 186if specified. 187.Pp 188.Ar CPU 189locks all virtual CPUs to the real CPU specified by 190.Ar CPU . 191.Pp 192Locking the vkernel to a set of cpus is recommended on multi-socket systems 193to improve NUMA locality of reference. 194.It Fl m Ar size 195Specify the amount of memory to be used by the kernel in bytes, 196.Cm K 197.Pq kilobytes , 198.Cm M 199.Pq megabytes 200or 201.Cm G 202.Pq gigabytes . 203Lowercase versions of 204.Cm K , M , 205and 206.Cm G 207are allowed. 208.It Fl n Ar numcpus Ns Op Ar :lbits Ns Oo Ar :cbits Oc 209.Ar numcpus 210specifies the number of CPUs you wish to emulate. 211Up to 16 CPUs are supported with 2 being the default unless otherwise 212specified. 213.Pp 214.Ar lbits 215specifies the number of bits within APICID(=CPUID) needed for representing 216the logical ID. 217Controls the number of threads/core (0 bits - 1 thread, 1 bit - 2 threads). 218This parameter is optional (mandatory only if 219.Ar cbits 220is specified). 221.Pp 222.Ar cbits 223specifies the number of bits within APICID(=CPUID) needed for representing 224the core ID. 225Controls the number of core/package (0 bits - 1 core, 1 bit - 2 cores). 226This parameter is optional. 227.It Fl p Ar pidfile 228Specify a pidfile in which to store the process ID. 229Scripts can use this file to locate the vkernel pid for the purpose of 230shutting down or killing it. 231.Pp 232The vkernel will hold a lock on the pidfile while running. 233Scripts may test for the lock to determine if the pidfile is valid or 234stale so as to avoid accidentally killing a random process. 235Something like '/usr/bin/lockf -ks -t 0 pidfile echo -n' may be used 236to test the lock. 237A non-zero exit code indicates that the pidfile represents a running 238vkernel. 239.Pp 240An error is issued and the vkernel exits if this file cannot be opened for 241writing or if it is already locked by an active vkernel process. 242.It Fl r Ar file Ns Op Ar :serno 243Specify a R/W disk image 244.Ar file 245to be used by the kernel, with the first 246.Fl r 247option defining 248.Li vkd0 , 249the second one 250.Li vkd1 , 251and so on. 252A serial number for the virtual disk can be specified in 253.Ar serno . 254.Pp 255The first 256.Fl r 257or 258.Fl c 259option specified on the command line will be the boot disk. 260.It Fl R Ar file Ns Op Ar :serno 261Works like 262.Fl r 263but treats the disk image as copy-on-write. This allows 264a private copy of the image to be modified but does not 265modify the image file. The image file will not be locked 266in this situation and multiple vkernels can run off the 267same image file if desired. 268.Pp 269Since modifications are thrown away, any data you wish 270to retain across invocations needs to be exported over 271the network prior to shutdown. 272This gives you the flexibility to mount the disk image 273either read-only or read-write depending on what is 274convenient. 275However, keep in mind that when mounting a COW image 276read-write, modifications will eat system memory and 277swap space until the vkernel is shut down. 278.It Fl s 279Boot into single-user mode. 280.It Fl t 281Tell the vkernel to use a precise host timer when calculating clock values. 282If the TSC isn't used, this will impose higher overhead on the vkernel as it 283will have to make a system call to the real host every time it wants to get 284the time. 285However, the more precise timer might be necessary for your application. 286.Pp 287By default, the vkernel uses the TSC cpu timer if possible, or an imprecise 288(host-tick-resolution) timer which uses a user-mapped kernel page and does 289not have any syscall overhead. 290To disable the TSC cpu timer, use the 291.Fl e Ar hw.tsc_cputimer_enable=0 292flag. 293.It Fl U 294Enable writing to kernel memory and module loading. 295By default, those are disabled for security reasons. 296.It Fl v 297Turn on verbose booting. 298.It Fl z 299Force the vkernel's ram to be pre-zerod. Useful for benchmarking on 300single-socket systems where the memory allocation does not have to be 301NUMA-friendly. 302This options is not recommended on multi-socket systems or when the 303.Fl l 304option is used. 305.El 306.Sh DEVICES 307A number of virtual device drivers exist to supplement the virtual kernel. 308.Ss Disk device 309The 310.Nm vkd 311driver allows for up to 16 312.Xr vn 4 313based disk devices. 314The root device will be 315.Li vkd0 316(see 317.Sx EXAMPLES 318for further information on how to prepare a root image). 319.Ss CD-ROM device 320The 321.Nm vcd 322driver allows for up to 16 virtual CD-ROM devices. 323Basically this is a read only 324.Nm vkd 325device with a block size of 2048. 326.Ss Network interface 327The 328.Nm vke 329driver supports up to 16 virtual network interfaces which are associated with 330.Xr tap 4 331devices on the host. 332For each 333.Nm vke 334device, the per-interface read only 335.Xr sysctl 3 336variable 337.Va hw.vke Ns Em X Ns Va .tap_unit 338holds the unit number of the associated 339.Xr tap 4 340device. 341.Pp 342By default, half of the total mbuf clusters available is distributed equally 343among all the vke devices up to 256. 344This can be overridden with the tunable 345.Va hw.vke.max_ringsize . 346Take into account the number passed will be aligned to the lower power of two. 347.Sh SIGNALS 348The virtual kernel only enables 349.Dv SIGQUIT 350and 351.Dv SIGTERM 352while operating in regular console mode. 353Sending 354.Ql \&^\e 355.Pq Dv SIGQUIT 356to the virtual kernel causes the virtual kernel to enter its internal 357.Xr ddb 4 358debugger and re-enable all other terminal signals. 359Sending 360.Dv SIGTERM 361to the virtual kernel triggers a clean shutdown by passing a 362.Dv SIGUSR2 363to the virtual kernel's 364.Xr init 8 365process. 366.Sh DEBUGGING 367It is possible to directly gdb the virtual kernel's process. 368It is recommended that you do a 369.Ql handle SIGSEGV noprint 370to ignore page faults processed by the virtual kernel itself and 371.Ql handle SIGUSR1 noprint 372to ignore signals used for simulating inter-processor interrupts. 373.Sh PROFILING 374To compile a vkernel with profiling support, the 375.Va CONFIGARGS 376variable needs to be used to pass 377.Fl p 378to 379.Xr config 8 . 380.Bd -literal 381cd /usr/src 382make -DNO_MODULES CONFIGARGS=-p buildkernel KERNCONF=VKERNEL64 383.Ed 384.Sh FILES 385.Bl -tag -width ".It Pa /sys/config/VKERNEL64" -compact 386.It Pa /dev/vcdX 387.Nm vcd 388device nodes 389.It Pa /dev/vkdX 390.Nm vkd 391device nodes 392.It Pa /sys/config/VKERNEL64 393.El 394.Pp 395.Nm 396configuration file, for 397.Xr config 8 . 398.Sh CONFIGURATION FILES 399Your virtual kernel is a complete 400.Dx 401system, but you might not want to run all the services a normal kernel runs. 402Here is what a typical virtual kernel's 403.Pa /etc/rc.conf 404file looks like, with some additional possibilities commented out. 405.Bd -literal 406hostname="vkernel" 407network_interfaces="lo0 vke0" 408ifconfig_vke0="DHCP" 409sendmail_enable="NO" 410#syslog_enable="NO" 411blanktime="NO" 412.Ed 413.Sh BOOT DRIVE SELECTION 414You can override the default boot drive selection and filesystem 415using a kernel environment variable. Note that the filesystem 416selected must be compiled into the vkernel and not loaded as 417a module. You need to escape some quotes around the variable data 418to avoid mis-interpretation of the colon in the 419.Fl e 420option. For example: 421.Pp 422.Fl e 423vfs.root.mountfrom=\\"hammer:vkd0s1d\\" 424.Sh DISKLESS OPERATION 425To boot a 426.Nm 427from a NFS root, a number of tunables need to be set: 428.Bl -tag -width indent 429.It Va boot.netif.ip 430IP address to be set in the vkernel interface. 431.It Va boot.netif.netmask 432Netmask for the IP to be set. 433.It Va boot.netif.name 434Network interface name inside the vkernel. 435.It Va boot.nfsroot.server 436Host running 437.Xr nfsd 8 . 438.It Va boot.nfsroot.path 439Host path where a world and distribution 440targets are properly installed. 441.El 442.Pp 443See an example on how to boot a diskless 444.Nm 445in the 446.Sx EXAMPLES 447section. 448.Sh EXAMPLES 449A couple of steps are necessary in order to prepare the system to build and 450run a virtual kernel. 451.Ss Setting up the filesystem 452The 453.Nm 454architecture needs a number of files which reside in 455.Pa /var/vkernel . 456Since these files tend to get rather big and the 457.Pa /var 458partition is usually of limited size, we recommend the directory to be 459created in the 460.Pa /home 461partition with a link to it in 462.Pa /var : 463.Bd -literal 464mkdir -p /home/var.vkernel/boot 465ln -s /home/var.vkernel /var/vkernel 466.Ed 467.Pp 468Next, a filesystem image to be used by the virtual kernel has to be 469created and populated (assuming world has been built previously). 470If the image is created on a UFS filesystem you might want to pre-zero it. 471On a HAMMER filesystem you should just truncate-extend to the image size 472as HAMMER does not re-use data blocks already present in the file. 473.Bd -literal 474vnconfig -c -S 2g -T vn0 /var/vkernel/rootimg.01 475disklabel -r -w vn0s0 auto 476disklabel -e vn0s0 # add `a' partition with fstype `4.2BSD' 477newfs /dev/vn0s0a 478mount /dev/vn0s0a /mnt 479cd /usr/src 480make installworld DESTDIR=/mnt 481cd etc 482make distribution DESTDIR=/mnt 483echo '/dev/vkd0s0a / ufs rw 1 1' >/mnt/etc/fstab 484echo 'proc /proc procfs rw 0 0' >>/mnt/etc/fstab 485.Ed 486.Pp 487Edit 488.Pa /mnt/etc/ttys 489and replace the 490.Li console 491entry with the following line and turn off all other gettys. 492.Bd -literal 493console "/usr/libexec/getty Pc" cons25 on secure 494.Ed 495.Pp 496Replace 497.Li \&Pc 498with 499.Li al.Pc 500if you would like to automatically log in as root. 501.Pp 502Then, unmount the disk. 503.Bd -literal 504umount /mnt 505vnconfig -u vn0 506.Ed 507.Ss Compiling the virtual kernel 508In order to compile a virtual kernel use the 509.Li VKERNEL64 510kernel configuration file residing in 511.Pa /sys/config 512(or a configuration file derived thereof): 513.Bd -literal 514cd /usr/src 515make -DNO_MODULES buildkernel KERNCONF=VKERNEL64 516make -DNO_MODULES installkernel KERNCONF=VKERNEL64 DESTDIR=/var/vkernel 517.Ed 518.Ss Enabling virtual kernel operation 519A special 520.Xr sysctl 8 , 521.Va vm.vkernel_enable , 522must be set to enable 523.Nm 524operation: 525.Bd -literal 526sysctl vm.vkernel_enable=1 527.Ed 528.Ss Configuring the network on the host system 529In order to access a network interface of the host system from the 530.Nm , 531you must add the interface to a 532.Xr bridge 4 533device which will then be passed to the 534.Fl I 535option: 536.Bd -literal 537kldload if_bridge.ko 538kldload if_tap.ko 539ifconfig bridge0 create 540ifconfig bridge0 addm re0 # assuming re0 is the host's interface 541ifconfig bridge0 up 542.Ed 543.Ss Running the kernel 544Finally, the virtual kernel can be run: 545.Bd -literal 546cd /var/vkernel 547\&./boot/kernel/kernel -m 1g -r rootimg.01 -I auto:bridge0 548.Ed 549.Pp 550You can issue the 551.Xr reboot 8 , 552.Xr halt 8 , 553or 554.Xr shutdown 8 555commands from inside a virtual kernel. 556After doing a clean shutdown the 557.Xr reboot 8 558command will re-exec the virtual kernel binary while the other two will 559cause the virtual kernel to exit. 560.Ss Diskless operation (vkernel as a NFS client) 561Booting a 562.Nm 563with a 564.Xr vknetd 8 565network configuration. The line continuation backslashes have been 566omitted. For convenience and to reduce confusion I recommend mounting 567the server's remote vkernel root onto the host running the vkernel binary 568using the same path as the NFS mount. It is assumed that a full system 569install has been made to /var/vkernel/root using a kernel KERNCONF=VKERNEL64 570for the kernel build. 571.Bd -literal 572\&/var/vkernel/root/boot/kernel/kernel 573 -m 1g -n 4 -I /var/run/vknet 574 -e boot.netif.ip=10.100.0.2 575 -e boot.netif.netmask=255.255.0.0 576 -e boot.netif.gateway=10.100.0.1 577 -e boot.netif.name=vke0 578 -e boot.nfsroot.server=10.0.0.55 579 -e boot.nfsroot.path=/var/vkernel/root 580.Ed 581.Pp 582In this example vknetd is assumed to have been started as shown below, before 583running the vkernel, using an unbridged TAP configuration routed through 584the host. 585IP forwarding must be turned on, and in this example the server resides 586on a different network accessible to the host executing the vkernel but not 587directly on the vkernel's subnet. 588.Bd -literal 589kldload if_tap 590sysctl net.inet.ip.forwarding=1 591vknetd -t tap0 10.100.0.1/16 592.Ed 593.Pp 594You can run multiple vkernels trivially with the same NFS root as long as 595you assign each one a different IP on the subnet (2, 3, 4, etc). You 596should also be careful with certain directories, particularly /var/run 597and possibly also /var/db depending on what your vkernels are going to be 598doing. 599This can complicate matters with /var/db/pkg. 600.Sh BUILDING THE WORLD UNDER A VKERNEL 601The virtual kernel platform does not have all the header files expected 602by a world build, so the easiest thing to do right now is to specify a 603pc64 (in a 64 bit vkernel) target when building the world under a virtual 604kernel, like this: 605.Bd -literal 606vkernel# make MACHINE_PLATFORM=pc64 buildworld 607vkernel# make MACHINE_PLATFORM=pc64 installworld 608.Ed 609.Sh SEE ALSO 610.Xr vknet 1 , 611.Xr bridge 4 , 612.Xr ifmedia 4 , 613.Xr tap 4 , 614.Xr vn 4 , 615.Xr sysctl.conf 5 , 616.Xr build 7 , 617.Xr config 8 , 618.Xr disklabel 8 , 619.Xr ifconfig 8 , 620.Xr vknetd 8 , 621.Xr vnconfig 8 622.Rs 623.%A Aggelos Economopoulos 624.%D March 2007 625.%T "A Peek at the DragonFly Virtual Kernel" 626.Re 627.Sh HISTORY 628Virtual kernels were introduced in 629.Dx 1.7 . 630.Sh AUTHORS 631.An -nosplit 632.An Matt Dillon 633thought up and implemented the 634.Nm 635architecture and wrote the 636.Nm vkd 637device driver. 638.An Sepherosa Ziehau 639wrote the 640.Nm vke 641device driver. 642This manual page was written by 643.An Sascha Wildner . 644