1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31VM Power Management Application 32=============================== 33 34Introduction 35------------ 36 37Applications running in Virtual Environments have an abstract view of 38the underlying hardware on the Host, in particular applications cannot see 39the binding of virtual to physical hardware. 40When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to 41Host Physical CPUs(pCPUS) is not apparent to an application 42and this pinning may change over time. 43Furthermore, Operating Systems on virtual machines do not have the ability 44to govern their own power policy; the Machine Specific Registers (MSRs) 45for enabling P-State transitions are not exposed to Operating Systems 46running on Virtual Machines(VMs). 47 48The Virtual Machine Power Management solution shows an example of 49how a DPDK application can indicate its processing requirements using VM local 50only information(vCPU/lcore) to a Host based Monitor which is responsible 51for accepting requests for frequency changes for a vCPU, translating the vCPU 52to a pCPU via libvirt and affecting the change in frequency. 53 54The solution is comprised of two high-level components: 55 56#. Example Host Application 57 58 Using a Command Line Interface(CLI) for VM->Host communication channel management 59 allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning, 60 inspecting and manually changing the frequency for each CPU. 61 The CLI runs on a single lcore while the thread responsible for managing 62 VM requests runs on a second lcore. 63 64 VM requests arriving on a channel for frequency changes are passed 65 to the librte_power ACPI cpufreq sysfs based library. 66 The Host Application relies on both qemu-kvm and libvirt to function. 67 68#. librte_power for Virtual Machines 69 70 Using an alternate implementation for the librte_power API, requests for 71 frequency changes are forwarded to the host monitor rather than 72 the APCI cpufreq sysfs interface used on the host. 73 74 The l3fwd-power application will use this implementation when deployed on a VM 75 (see Chapter 11 "L3 Forwarding with Power Management Application"). 76 77.. _figure_24: 78 79**Figure 24. Highlevel Solution** 80 81|vm_power_mgr_highlevel| 82 83Overview 84-------- 85 86VM Power Management employs qemu-kvm to provide communications channels 87between the host and VMs in the form of Virtio-Serial which appears as 88a paravirtualized serial device on a VM and can be configured to use 89various backends on the host. For this example each Virtio-Serial endpoint 90on the host is configured as AF_UNIX file socket, supporting poll/select 91and epoll for event notification. 92In this example each channel endpoint on the host is monitored via 93epoll for EPOLLIN events. 94Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM, 95where each VM can have a number of channels up to a maximum of 64 per VM, 96in this example each DPDK lcore on a VM has exclusive access to a channel. 97 98To enable frequency changes from within a VM, a request via the librte_power interface 99is forwarded via Virtio-Serial to the host, each request contains the vCPU 100and power command(scale up/down/min/max). 101The API for host and guest librte_power is consistent across environments, 102with the selection of VM or Host Implementation determined at automatically 103at runtime based on the environment. 104 105Upon receiving a request, the host translates the vCPU to a pCPU via 106the libvirt API before forwarding to the host librte_power. 107 108.. _figure_25: 109 110**Figure 25. VM request to scale frequency** 111 112|vm_power_mgr_vm_request_seq| 113 114Performance Considerations 115~~~~~~~~~~~~~~~~~~~~~~~~~~ 116 117While Haswell Microarchitecture allows for independent power control for each core, 118earlier Microarchtectures do not offer such fine grained control. 119When deployed on pre-Haswell platforms greater care must be taken in selecting 120which cores are assigned to a VM, for instance a core will not scale down 121until its sibling is similarly scaled. 122 123Configuration 124------------- 125 126BIOS 127~~~~ 128 129Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS 130if the power management feature of DPDK is to be used. 131Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist, 132and the CPU frequency-based power management cannot be used. 133Consult the relevant BIOS documentation to determine how these settings 134can be accessed. 135 136Host Operating System 137~~~~~~~~~~~~~~~~~~~~~ 138 139The Host OS must also have the *apci_cpufreq* module installed, in some cases 140the *intel_pstate* driver may be the default Power Management environment. 141To enable *acpi_cpufreq* and disable *intel_pstate*, add the following 142to the grub Linux command line: 143 144.. code-block:: console 145 146 intel_pstate=disable 147 148Upon rebooting, load the *acpi_cpufreq* module: 149 150.. code-block:: console 151 152 modprobe acpi_cpufreq 153 154Hypervisor Channel Configuration 155~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 156 157Virtio-Serial channels are configured via libvirt XML: 158 159 160.. code-block:: xml 161 162 <name>{vm_name}</name> 163 <controller type='virtio-serial' index='0'> 164 <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> 165 </controller> 166 <channel type='unix'> 167 <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/> 168 <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}/> 169 <address type='virtio-serial' controller='0' bus='0' port='{N}'/> 170 </channel> 171 172 173Where a single controller of type *virtio-serial* is created and up to 32 channels 174can be associated with a single controller and multiple controllers can be specified. 175The convention is to use the name of the VM in the host path *{vm_name}* and 176to increment *{channel_num}* for each channel, likewise the port value *{N}* 177must be incremented for each channel. 178 179Each channel on the host will appear in *path*, the directory */tmp/powermonitor/* 180must first be created and given qemu permissions 181 182.. code-block:: console 183 184 mkdir /tmp/powermonitor/ 185 chown qemu:qemu /tmp/powermonitor 186 187Note that files and directories within /tmp are generally removed upon 188rebooting the host and the above steps may need to be carried out after each reboot. 189 190The serial device as it appears on a VM is configured with the *target* element attribute *name* 191and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*, 192where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications. 193 194Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}* 195 196Compiling and Running the Host Application 197------------------------------------------ 198 199Compiling 200~~~~~~~~~ 201 202#. export RTE_SDK=/path/to/rte_sdk 203#. cd ${RTE_SDK}/examples/vm_power_manager 204#. make 205 206Running 207~~~~~~~ 208 209The application does not have any specific command line options other than *EAL*: 210 211.. code-block:: console 212 213 ./build/vm_power_mgr [EAL options] 214 215The application requires exactly two cores to run, one core is dedicated to the CLI, 216while the other is dedicated to the channel endpoint monitor, for example to run 217on cores 0 & 1 on a system with 4 memory channels: 218 219.. code-block:: console 220 221 ./build/vm_power_mgr -c 0x3 -n 4 222 223After successful initialization the user is presented with VM Power Manager CLI: 224 225.. code-block:: console 226 227 vm_power> 228 229Virtual Machines can now be added to the VM Power Manager: 230 231.. code-block:: console 232 233 vm_power> add_vm {vm_name} 234 235When a {vm_name} is specified with the *add_vm* command a lookup is performed 236with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier 237to associate channels with a particular VM and for executing operations on a VM within the CLI. 238VMs do not have to be running in order to add them. 239 240A number of commands can be issued via the CLI in relation to VMs: 241 242 Remove a Virtual Machine identified by {vm_name} from the VM Power Manager. 243 244 .. code-block:: console 245 246 rm_vm {vm_name} 247 248 Add communication channels for the specified VM, the virtio channels must be enabled 249 in the VM configuration(qemu/libvirt) and the associated VM must be active. 250 {list} is a comma-separated list of channel numbers to add, using the keyword 'all' 251 will attempt to add all channels for the VM: 252 253 .. code-block:: console 254 255 add_channels {vm_name} {list}|all 256 257 Enable or disable the communication channels in {list}(comma-separated) 258 for the specified VM, alternatively list can be replaced with keyword 'all'. 259 Disabled channels will still receive packets on the host, however the commands 260 they specify will be ignored. Set status to 'enabled' to begin processing requests again: 261 262 .. code-block:: console 263 264 set_channel_status {vm_name} {list}|all enabled|disabled 265 266 Print to the CLI the information on the specified VM, the information 267 lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with 268 any communication channels associated with each VM, along with the status of each channel: 269 270 .. code-block:: console 271 272 show_vm {vm_name} 273 274 Set the binding of Virtual CPU on VM with name {vm_name} to the Physical CPU mask: 275 276 .. code-block:: console 277 278 set_pcpu_mask {vm_name} {vcpu} {pcpu} 279 280 Set the binding of Virtual CPU on VM to the Physical CPU: 281 282 .. code-block:: console 283 284 set_pcpu {vm_name} {vcpu} {pcpu} 285 286Manual control and inspection can also be carried in relation CPU frequency scaling: 287 288 Get the current frequency for each core specified in the mask: 289 290 .. code-block:: console 291 292 show_cpu_freq_mask {mask} 293 294 Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max: 295 296 .. code-block:: console 297 298 set_cpu_freq {core_mask} up|down|min|max 299 300 Get the current frequency for the specified core: 301 302 .. code-block:: console 303 304 show_cpu_freq {core_num} 305 306 Set the current frequency for the specified core by scaling up/down/min/max: 307 308 .. code-block:: console 309 310 set_cpu_freq {core_num} up|down|min|max 311 312Compiling and Running the Guest Applications 313-------------------------------------------- 314 315For compiling and running l3fwd-power, see Chapter 11 "L3 Forwarding with Power Management Application". 316 317A guest CLI is also provided for validating the setup. 318 319For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the 320host application using the *add_channels* command on the host. 321 322Compiling 323~~~~~~~~~ 324 325#. export RTE_SDK=/path/to/rte_sdk 326#. cd ${RTE_SDK}/examples/vm_power_manager/guest_cli 327#. make 328 329Running 330~~~~~~~ 331 332The application does not have any specific command line options other than *EAL*: 333 334.. code-block:: console 335 336 ./build/vm_power_mgr [EAL options] 337 338The application for example purposes uses a channel for each lcore enabled, 339for example to run on cores 0,1,2,3 on a system with 4 memory channels: 340 341.. code-block:: console 342 343 ./build/guest_vm_power_mgr -c 0xf -n 4 344 345 346After successful initialization the user is presented with VM Power Manager Guest CLI: 347 348.. code-block:: console 349 350 vm_power(guest)> 351 352To change the frequency of a lcore, use the set_cpu_freq command. 353Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max. 354 355.. code-block:: console 356 357 set_cpu_freq {core_num} up|down|min|max 358 359.. |vm_power_mgr_highlevel| image:: img/vm_power_mgr_highlevel.* 360 361.. |vm_power_mgr_vm_request_seq| image:: img/vm_power_mgr_vm_request_seq.* 362