xref: /dflybsd-src/share/man/man7/vkernel.7 (revision 0ffa96a296bcf3e38b9a03a409a10889186072ae)
1.\"
2.\" Copyright (c) 2006, 2007
3.\"	The DragonFly Project.  All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\"
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in
13.\"    the documentation and/or other materials provided with the
14.\"    distribution.
15.\" 3. Neither the name of The DragonFly Project nor the names of its
16.\"    contributors may be used to endorse or promote products derived
17.\"    from this software without specific, prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
32.Dd December 27, 2018
33.Dt VKERNEL 7
34.Os
35.Sh NAME
36.Nm vkernel ,
37.Nm vcd ,
38.Nm vkd ,
39.Nm vke
40.Nd virtual kernel architecture
41.Sh SYNOPSIS
42.Cd "platform vkernel64 # for 64 bit vkernels"
43.Cd "device vcd"
44.Cd "device vkd"
45.Cd "device vke"
46.Pp
47.Pa /var/vkernel/boot/kernel/kernel
48.Op Fl hdstUvz
49.Op Fl c Ar file
50.Op Fl e Ar name Ns = Ns Li value : Ns Ar name Ns = Ns Li value : Ns ...
51.Op Fl i Ar file
52.Op Fl I Ar interface Ns Op Ar :address1 Ns Oo Ar :address2 Oc Ns Oo Ar /netmask Oc Ns Oo Ar =mac Oc
53.Op Fl l Ar cpulock
54.Op Fl m Ar size
55.Op Fl n Ar numcpus Ns Op Ar :lbits Ns Oo Ar :cbits Oc
56.Op Fl p Ar pidfile
57.Op Fl r Ar file Ns Op Ar :serno
58.Op Fl R Ar file Ns Op Ar :serno
59.Sh DESCRIPTION
60The
61.Nm
62architecture allows for running
63.Dx
64kernels in userland.
65.Pp
66The following options are available:
67.Bl -tag -width ".Fl m Ar size"
68.It Fl c Ar file
69Specify a readonly CD-ROM image
70.Ar file
71to be used by the kernel, with the first
72.Fl c
73option defining
74.Li vcd0 ,
75the second one
76.Li vcd1 ,
77and so on.
78The first
79.Fl r ,
80.Fl R ,
81or
82.Fl c
83option specified on the command line will be the boot disk.
84The CD9660 filesystem is assumed when booting from this media.
85.It Fl d
86Disables hardware pagetable for
87.Nm .
88.It Fl e Ar name Ns = Ns Li value : Ns Ar name Ns = Ns Li value : Ns ...
89Specify an environment to be used by the kernel.
90This option can be specified more than once.
91.It Fl h
92Shows a list of available options, each with a short description.
93.It Fl i Ar file
94Specify a memory image
95.Ar file
96to be used by the virtual kernel.
97If no
98.Fl i
99option is given, the kernel will generate a name of the form
100.Pa /var/vkernel/memimg.XXXXXX ,
101with the trailing
102.Ql X Ns s
103being replaced by a sequential number, e.g.\&
104.Pa memimg.000001 .
105.It Fl I Ar interface Ns Op Ar :address1 Ns Oo Ar :address2 Oc Ns Oo Ar /netmask Oc Ns Oo Ar =MAC Oc
106Create a virtual network device, with the first
107.Fl I
108option defining
109.Li vke0 ,
110the second one
111.Li vke1 ,
112and so on.
113.Pp
114The
115.Ar interface
116argument is the name of a
117.Xr tap 4
118device node or the path to a
119.Xr vknetd 8
120socket.
121The
122.Pa /dev/
123path prefix does not have to be specified and will be automatically prepended
124for a device node.
125Specifying
126.Cm auto
127will pick the first unused
128.Xr tap 4
129device.
130.Pp
131The
132.Ar address1
133and
134.Ar address2
135arguments are the IP addresses of the
136.Xr tap 4
137and
138.Nm vke
139interfaces.
140Optionally,
141.Ar address1
142may be of the form
143.Li bridge Ns Em X
144in which case the
145.Xr tap 4
146interface is added to the specified
147.Xr bridge 4
148interface.
149The
150.Nm vke
151address is not assigned until the interface is brought up in the guest.
152.Pp
153The
154.Ar netmask
155argument applies to all interfaces for which an address is specified.
156.Pp
157The
158.Ar MAC
159argument is the MAC address of the
160.Xr vke 4
161interface.
162If not specified, a pseudo-random one will be generated.
163.Pp
164When running multiple vkernels it is often more convenient to simply
165connect to a
166.Xr vknetd 8
167socket and let vknetd deal with the tap and/or bridge.
168An example of this would be
169.Pa /var/run/vknet:0.0.0.0:10.2.0.2/16 .
170.It Fl l Ar cpulock
171Specify which, if any, real CPUs to lock virtual CPUs to.
172.Ar cpulock
173is one of
174.Cm any ,
175.Cm map Ns Op Ns , Ns Ar startCPU ,
176or
177.Ar CPU .
178.Pp
179.Cm any
180does not map virtual CPUs to real CPUs.
181This is the default.
182.Pp
183.Cm map Ns Op Ns , Ns Ar startCPU
184maps each virtual CPU to a real CPU starting with real CPU 0 or
185.Ar startCPU
186if specified.
187.Pp
188.Ar CPU
189locks all virtual CPUs to the real CPU specified by
190.Ar CPU .
191.Pp
192Locking the vkernel to a set of cpus is recommended on multi-socket systems
193to improve NUMA locality of reference.
194.It Fl m Ar size
195Specify the amount of memory to be used by the kernel in bytes,
196.Cm K
197.Pq kilobytes ,
198.Cm M
199.Pq megabytes
200or
201.Cm G
202.Pq gigabytes .
203Lowercase versions of
204.Cm K , M ,
205and
206.Cm G
207are allowed.
208.It Fl n Ar numcpus Ns Op Ar :lbits Ns Oo Ar :cbits Oc
209.Ar numcpus
210specifies the number of CPUs you wish to emulate.
211Up to 16 CPUs are supported with 2 being the default unless otherwise
212specified.
213.Pp
214.Ar lbits
215specifies the number of bits within APICID(=CPUID) needed for representing
216the logical ID.
217Controls the number of threads/core (0 bits - 1 thread, 1 bit - 2 threads).
218This parameter is optional (mandatory only if
219.Ar cbits
220is specified).
221.Pp
222.Ar cbits
223specifies the number of bits within APICID(=CPUID) needed for representing
224the core ID.
225Controls the number of core/package (0 bits - 1 core, 1 bit - 2 cores).
226This parameter is optional.
227.It Fl p Ar pidfile
228Specify a pidfile in which to store the process ID.
229Scripts can use this file to locate the vkernel pid for the purpose of
230shutting down or killing it.
231.Pp
232The vkernel will hold a lock on the pidfile while running.
233Scripts may test for the lock to determine if the pidfile is valid or
234stale so as to avoid accidentally killing a random process.
235Something like '/usr/bin/lockf -ks -t 0 pidfile echo -n' may be used
236to test the lock.
237A non-zero exit code indicates that the pidfile represents a running
238vkernel.
239.Pp
240An error is issued and the vkernel exits if this file cannot be opened for
241writing or if it is already locked by an active vkernel process.
242.It Fl r Ar file Ns Op Ar :serno
243Specify a R/W disk image
244.Ar file
245to be used by the kernel, with the first
246.Fl r
247option defining
248.Li vkd0 ,
249the second one
250.Li vkd1 ,
251and so on.
252A serial number for the virtual disk can be specified in
253.Ar serno .
254.Pp
255The first
256.Fl r
257or
258.Fl c
259option specified on the command line will be the boot disk.
260.It Fl R Ar file Ns Op Ar :serno
261Works like
262.Fl r
263but treats the disk image as copy-on-write.  This allows
264a private copy of the image to be modified but does not
265modify the image file.  The image file will not be locked
266in this situation and multiple vkernels can run off the
267same image file if desired.
268.Pp
269Since modifications are thrown away, any data you wish
270to retain across invocations needs to be exported over
271the network prior to shutdown.
272This gives you the flexibility to mount the disk image
273either read-only or read-write depending on what is
274convenient.
275However, keep in mind that when mounting a COW image
276read-write, modifications will eat system memory and
277swap space until the vkernel is shut down.
278.It Fl s
279Boot into single-user mode.
280.It Fl t
281Tell the vkernel to use a precise host timer when calculating clock values.
282If the TSC isn't used, this will impose higher overhead on the vkernel as it
283will have to make a system call to the real host every time it wants to get
284the time.
285However, the more precise timer might be necessary for your application.
286.Pp
287By default, the vkernel uses the TSC cpu timer if possible, or an imprecise
288(host-tick-resolution) timer which uses a user-mapped kernel page and does
289not have any syscall overhead.
290To disable the TSC cpu timer, use the
291.Fl e Ar hw.tsc_cputimer_enable=0
292flag.
293.It Fl U
294Enable writing to kernel memory and module loading.
295By default, those are disabled for security reasons.
296.It Fl v
297Turn on verbose booting.
298.It Fl z
299Force the vkernel's ram to be pre-zerod.  Useful for benchmarking on
300single-socket systems where the memory allocation does not have to be
301NUMA-friendly.
302This options is not recommended on multi-socket systems or when the
303.Fl l
304option is used.
305.El
306.Sh DEVICES
307A number of virtual device drivers exist to supplement the virtual kernel.
308.Ss Disk device
309The
310.Nm vkd
311driver allows for up to 16
312.Xr vn 4
313based disk devices.
314The root device will be
315.Li vkd0
316(see
317.Sx EXAMPLES
318for further information on how to prepare a root image).
319.Ss CD-ROM device
320The
321.Nm vcd
322driver allows for up to 16 virtual CD-ROM devices.
323Basically this is a read only
324.Nm vkd
325device with a block size of 2048.
326.Ss Network interface
327The
328.Nm vke
329driver supports up to 16 virtual network interfaces which are associated with
330.Xr tap 4
331devices on the host.
332For each
333.Nm vke
334device, the per-interface read only
335.Xr sysctl 3
336variable
337.Va hw.vke Ns Em X Ns Va .tap_unit
338holds the unit number of the associated
339.Xr tap 4
340device.
341.Pp
342By default, half of the total mbuf clusters available is distributed equally
343among all the vke devices up to 256.
344This can be overridden with the tunable
345.Va hw.vke.max_ringsize .
346Take into account the number passed will be aligned to the lower power of two.
347.Sh SIGNALS
348The virtual kernel only enables
349.Dv SIGQUIT
350and
351.Dv SIGTERM
352while operating in regular console mode.
353Sending
354.Ql \&^\e
355.Pq Dv SIGQUIT
356to the virtual kernel causes the virtual kernel to enter its internal
357.Xr ddb 4
358debugger and re-enable all other terminal signals.
359Sending
360.Dv SIGTERM
361to the virtual kernel triggers a clean shutdown by passing a
362.Dv SIGUSR2
363to the virtual kernel's
364.Xr init 8
365process.
366.Sh DEBUGGING
367It is possible to directly gdb the virtual kernel's process.
368It is recommended that you do a
369.Ql handle SIGSEGV noprint
370to ignore page faults processed by the virtual kernel itself and
371.Ql handle SIGUSR1 noprint
372to ignore signals used for simulating inter-processor interrupts.
373.Sh PROFILING
374To compile a vkernel with profiling support, the
375.Va CONFIGARGS
376variable needs to be used to pass
377.Fl p
378to
379.Xr config 8 .
380.Bd -literal
381cd /usr/src
382make -DNO_MODULES CONFIGARGS=-p buildkernel KERNCONF=VKERNEL64
383.Ed
384.Sh FILES
385.Bl -tag -width ".It Pa /sys/config/VKERNEL64" -compact
386.It Pa /dev/vcdX
387.Nm vcd
388device nodes
389.It Pa /dev/vkdX
390.Nm vkd
391device nodes
392.It Pa /sys/config/VKERNEL64
393.El
394.Pp
395.Nm
396configuration file, for
397.Xr config 8 .
398.Sh CONFIGURATION FILES
399Your virtual kernel is a complete
400.Dx
401system, but you might not want to run all the services a normal kernel runs.
402Here is what a typical virtual kernel's
403.Pa /etc/rc.conf
404file looks like, with some additional possibilities commented out.
405.Bd -literal
406hostname="vkernel"
407network_interfaces="lo0 vke0"
408ifconfig_vke0="DHCP"
409sendmail_enable="NO"
410#syslog_enable="NO"
411blanktime="NO"
412.Ed
413.Sh BOOT DRIVE SELECTION
414You can override the default boot drive selection and filesystem
415using a kernel environment variable.  Note that the filesystem
416selected must be compiled into the vkernel and not loaded as
417a module.  You need to escape some quotes around the variable data
418to avoid mis-interpretation of the colon in the
419.Fl e
420option.  For example:
421.Pp
422.Fl e
423vfs.root.mountfrom=\\"hammer:vkd0s1d\\"
424.Sh DISKLESS OPERATION
425To boot a
426.Nm
427from a NFS root, a number of tunables need to be set:
428.Bl -tag -width indent
429.It Va boot.netif.ip
430IP address to be set in the vkernel interface.
431.It Va boot.netif.netmask
432Netmask for the IP to be set.
433.It Va boot.netif.name
434Network interface name inside the vkernel.
435.It Va boot.nfsroot.server
436Host running
437.Xr nfsd 8 .
438.It Va boot.nfsroot.path
439Host path where a world and distribution
440targets are properly installed.
441.El
442.Pp
443See an example on how to boot a diskless
444.Nm
445in the
446.Sx EXAMPLES
447section.
448.Sh EXAMPLES
449A couple of steps are necessary in order to prepare the system to build and
450run a virtual kernel.
451.Ss Setting up the filesystem
452The
453.Nm
454architecture needs a number of files which reside in
455.Pa /var/vkernel .
456Since these files tend to get rather big and the
457.Pa /var
458partition is usually of limited size, we recommend the directory to be
459created in the
460.Pa /home
461partition with a link to it in
462.Pa /var :
463.Bd -literal
464mkdir -p /home/var.vkernel/boot
465ln -s /home/var.vkernel /var/vkernel
466.Ed
467.Pp
468Next, a filesystem image to be used by the virtual kernel has to be
469created and populated (assuming world has been built previously).
470If the image is created on a UFS filesystem you might want to pre-zero it.
471On a HAMMER filesystem you should just truncate-extend to the image size
472as HAMMER does not re-use data blocks already present in the file.
473.Bd -literal
474vnconfig -c -S 2g -T vn0 /var/vkernel/rootimg.01
475disklabel -r -w vn0s0 auto
476disklabel -e vn0s0	# add `a' partition with fstype `4.2BSD'
477newfs /dev/vn0s0a
478mount /dev/vn0s0a /mnt
479cd /usr/src
480make installworld DESTDIR=/mnt
481cd etc
482make distribution DESTDIR=/mnt
483echo '/dev/vkd0s0a	/	ufs	rw	1  1' >/mnt/etc/fstab
484echo 'proc		/proc	procfs	rw	0  0' >>/mnt/etc/fstab
485.Ed
486.Pp
487Edit
488.Pa /mnt/etc/ttys
489and replace the
490.Li console
491entry with the following line and turn off all other gettys.
492.Bd -literal
493console	"/usr/libexec/getty Pc"		cons25	on  secure
494.Ed
495.Pp
496Replace
497.Li \&Pc
498with
499.Li al.Pc
500if you would like to automatically log in as root.
501.Pp
502Then, unmount the disk.
503.Bd -literal
504umount /mnt
505vnconfig -u vn0
506.Ed
507.Ss Compiling the virtual kernel
508In order to compile a virtual kernel use the
509.Li VKERNEL64
510kernel configuration file residing in
511.Pa /sys/config
512(or a configuration file derived thereof):
513.Bd -literal
514cd /usr/src
515make -DNO_MODULES buildkernel KERNCONF=VKERNEL64
516make -DNO_MODULES installkernel KERNCONF=VKERNEL64 DESTDIR=/var/vkernel
517.Ed
518.Ss Enabling virtual kernel operation
519A special
520.Xr sysctl 8 ,
521.Va vm.vkernel_enable ,
522must be set to enable
523.Nm
524operation:
525.Bd -literal
526sysctl vm.vkernel_enable=1
527.Ed
528.Ss Configuring the network on the host system
529In order to access a network interface of the host system from the
530.Nm ,
531you must add the interface to a
532.Xr bridge 4
533device which will then be passed to the
534.Fl I
535option:
536.Bd -literal
537kldload if_bridge.ko
538kldload if_tap.ko
539ifconfig bridge0 create
540ifconfig bridge0 addm re0	# assuming re0 is the host's interface
541ifconfig bridge0 up
542.Ed
543.Ss Running the kernel
544Finally, the virtual kernel can be run:
545.Bd -literal
546cd /var/vkernel
547\&./boot/kernel/kernel -m 1g -r rootimg.01 -I auto:bridge0
548.Ed
549.Pp
550You can issue the
551.Xr reboot 8 ,
552.Xr halt 8 ,
553or
554.Xr shutdown 8
555commands from inside a virtual kernel.
556After doing a clean shutdown the
557.Xr reboot 8
558command will re-exec the virtual kernel binary while the other two will
559cause the virtual kernel to exit.
560.Ss Diskless operation (vkernel as a NFS client)
561Booting a
562.Nm
563with a
564.Xr vknetd 8
565network configuration.  The line continuation backslashes have been
566omitted.  For convenience and to reduce confusion I recommend mounting
567the server's remote vkernel root onto the host running the vkernel binary
568using the same path as the NFS mount.  It is assumed that a full system
569install has been made to /var/vkernel/root using a kernel KERNCONF=VKERNEL64
570for the kernel build.
571.Bd -literal
572\&/var/vkernel/root/boot/kernel/kernel
573	-m 1g -n 4 -I /var/run/vknet
574	-e boot.netif.ip=10.100.0.2
575	-e boot.netif.netmask=255.255.0.0
576	-e boot.netif.gateway=10.100.0.1
577	-e boot.netif.name=vke0
578	-e boot.nfsroot.server=10.0.0.55
579	-e boot.nfsroot.path=/var/vkernel/root
580.Ed
581.Pp
582In this example vknetd is assumed to have been started as shown below, before
583running the vkernel, using an unbridged TAP configuration routed through
584the host.
585IP forwarding must be turned on, and in this example the server resides
586on a different network accessible to the host executing the vkernel but not
587directly on the vkernel's subnet.
588.Bd -literal
589kldload if_tap
590sysctl net.inet.ip.forwarding=1
591vknetd -t tap0 10.100.0.1/16
592.Ed
593.Pp
594You can run multiple vkernels trivially with the same NFS root as long as
595you assign each one a different IP on the subnet (2, 3, 4, etc).  You
596should also be careful with certain directories, particularly /var/run
597and possibly also /var/db depending on what your vkernels are going to be
598doing.
599This can complicate matters with /var/db/pkg.
600.Sh BUILDING THE WORLD UNDER A VKERNEL
601The virtual kernel platform does not have all the header files expected
602by a world build, so the easiest thing to do right now is to specify a
603pc64 (in a 64 bit vkernel) target when building the world under a virtual
604kernel, like this:
605.Bd -literal
606vkernel# make MACHINE_PLATFORM=pc64 buildworld
607vkernel# make MACHINE_PLATFORM=pc64 installworld
608.Ed
609.Sh SEE ALSO
610.Xr vknet 1 ,
611.Xr bridge 4 ,
612.Xr ifmedia 4 ,
613.Xr tap 4 ,
614.Xr vn 4 ,
615.Xr sysctl.conf 5 ,
616.Xr build 7 ,
617.Xr config 8 ,
618.Xr disklabel 8 ,
619.Xr ifconfig 8 ,
620.Xr vknetd 8 ,
621.Xr vnconfig 8
622.Rs
623.%A Aggelos Economopoulos
624.%D March 2007
625.%T "A Peek at the DragonFly Virtual Kernel"
626.Re
627.Sh HISTORY
628Virtual kernels were introduced in
629.Dx 1.7 .
630.Sh AUTHORS
631.An -nosplit
632.An Matt Dillon
633thought up and implemented the
634.Nm
635architecture and wrote the
636.Nm vkd
637device driver.
638.An Sepherosa Ziehau
639wrote the
640.Nm vke
641device driver.
642This manual page was written by
643.An Sascha Wildner .
644