1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2010-2014 Intel Corporation. 3 4VM Power Management Application 5=============================== 6 7Introduction 8------------ 9 10Applications running in Virtual Environments have an abstract view of 11the underlying hardware on the Host, in particular applications cannot see 12the binding of virtual to physical hardware. 13When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to 14Host Physical CPUs(pCPUS) is not apparent to an application 15and this pinning may change over time. 16Furthermore, Operating Systems on virtual machines do not have the ability 17to govern their own power policy; the Machine Specific Registers (MSRs) 18for enabling P-State transitions are not exposed to Operating Systems 19running on Virtual Machines(VMs). 20 21The Virtual Machine Power Management solution shows an example of 22how a DPDK application can indicate its processing requirements using VM local 23only information(vCPU/lcore, etc.) to a Host based Monitor which is responsible 24for accepting requests for frequency changes for a vCPU, translating the vCPU 25to a pCPU via libvirt and affecting the change in frequency. 26 27The solution is comprised of two high-level components: 28 29#. Example Host Application 30 31 Using a Command Line Interface(CLI) for VM->Host communication channel management 32 allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning, 33 inspecting and manually changing the frequency for each CPU. 34 The CLI runs on a single lcore while the thread responsible for managing 35 VM requests runs on a second lcore. 36 37 VM requests arriving on a channel for frequency changes are passed 38 to the librte_power ACPI cpufreq sysfs based library. 39 The Host Application relies on both qemu-kvm and libvirt to function. 40 41 This monitoring application is responsible for: 42 43 - Accepting requests from client applications: Client applications can 44 request frequency changes for a vCPU, translating 45 the vCPU to a pCPU via libvirt and affecting the change in frequency. 46 47 - Accepting policies from client applications: Client application can 48 send a policy to the host application. The 49 host application will then apply the rules of the policy independent 50 of the application. For example, the policy can contain time-of-day 51 information for busy/quiet periods, and the host application can scale 52 up/down the relevant cores when required. See the details of the guest 53 application below for more information on setting the policy values. 54 55 - Out-of-band monitoring of workloads via cores hardware event counters: 56 The host application can manage power for an application in a virtualised 57 OR non-virtualised environment by looking at the event counters of the 58 cores and taking action based on the branch hit/miss ratio. See the host 59 application '--core-list' command line parameter below. 60 61#. librte_power for Virtual Machines 62 63 Using an alternate implementation for the librte_power API, requests for 64 frequency changes are forwarded to the host monitor rather than 65 the APCI cpufreq sysfs interface used on the host. 66 67 The l3fwd-power application will use this implementation when deployed on a VM 68 (see :doc:`l3_forward_power_man`). 69 70.. _figure_vm_power_mgr_highlevel: 71 72.. figure:: img/vm_power_mgr_highlevel.* 73 74 Highlevel Solution 75 76 77Overview 78-------- 79 80VM Power Management employs qemu-kvm to provide communications channels 81between the host and VMs in the form of Virtio-Serial which appears as 82a paravirtualized serial device on a VM and can be configured to use 83various backends on the host. For this example each Virtio-Serial endpoint 84on the host is configured as AF_UNIX file socket, supporting poll/select 85and epoll for event notification. 86In this example each channel endpoint on the host is monitored via 87epoll for EPOLLIN events. 88Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM, 89where each VM can have a number of channels up to a maximum of 64 per VM, 90in this example each DPDK lcore on a VM has exclusive access to a channel. 91 92To enable frequency changes from within a VM, a request via the librte_power interface 93is forwarded via Virtio-Serial to the host, each request contains the vCPU 94and power command(scale up/down/min/max). 95The API for host and guest librte_power is consistent across environments, 96with the selection of VM or Host Implementation determined at automatically 97at runtime based on the environment. 98 99Upon receiving a request, the host translates the vCPU to a pCPU via 100the libvirt API before forwarding to the host librte_power. 101 102.. _figure_vm_power_mgr_vm_request_seq: 103 104.. figure:: img/vm_power_mgr_vm_request_seq.* 105 106 VM request to scale frequency 107 108 109Performance Considerations 110~~~~~~~~~~~~~~~~~~~~~~~~~~ 111 112While Haswell Microarchitecture allows for independent power control for each core, 113earlier Microarchtectures do not offer such fine grained control. 114When deployed on pre-Haswell platforms greater care must be taken in selecting 115which cores are assigned to a VM, for instance a core will not scale down 116until its sibling is similarly scaled. 117 118Configuration 119------------- 120 121BIOS 122~~~~ 123 124Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS 125if the power management feature of DPDK is to be used. 126Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist, 127and the CPU frequency-based power management cannot be used. 128Consult the relevant BIOS documentation to determine how these settings 129can be accessed. 130 131Host Operating System 132~~~~~~~~~~~~~~~~~~~~~ 133 134The Host OS must also have the *apci_cpufreq* module installed, in some cases 135the *intel_pstate* driver may be the default Power Management environment. 136To enable *acpi_cpufreq* and disable *intel_pstate*, add the following 137to the grub Linux command line: 138 139.. code-block:: console 140 141 intel_pstate=disable 142 143Upon rebooting, load the *acpi_cpufreq* module: 144 145.. code-block:: console 146 147 modprobe acpi_cpufreq 148 149Hypervisor Channel Configuration 150~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 151 152Virtio-Serial channels are configured via libvirt XML: 153 154 155.. code-block:: xml 156 157 <name>{vm_name}</name> 158 <controller type='virtio-serial' index='0'> 159 <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> 160 </controller> 161 <channel type='unix'> 162 <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/> 163 <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}'/> 164 <address type='virtio-serial' controller='0' bus='0' port='{N}'/> 165 </channel> 166 167 168Where a single controller of type *virtio-serial* is created and up to 32 channels 169can be associated with a single controller and multiple controllers can be specified. 170The convention is to use the name of the VM in the host path *{vm_name}* and 171to increment *{channel_num}* for each channel, likewise the port value *{N}* 172must be incremented for each channel. 173 174Each channel on the host will appear in *path*, the directory */tmp/powermonitor/* 175must first be created and given qemu permissions 176 177.. code-block:: console 178 179 mkdir /tmp/powermonitor/ 180 chown qemu:qemu /tmp/powermonitor 181 182Note that files and directories within /tmp are generally removed upon 183rebooting the host and the above steps may need to be carried out after each reboot. 184 185The serial device as it appears on a VM is configured with the *target* element attribute *name* 186and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*, 187where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications. 188 189Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}* 190 191Compiling and Running the Host Application 192------------------------------------------ 193 194Compiling 195~~~~~~~~~ 196 197For information on compiling DPDK and the sample applications 198see :doc:`compiling`. 199 200The application is located in the ``vm_power_manager`` sub-directory. 201 202To build just the ``vm_power_manager`` application: 203 204.. code-block:: console 205 206 export RTE_SDK=/path/to/rte_sdk 207 export RTE_TARGET=build 208 cd ${RTE_SDK}/examples/vm_power_manager/ 209 make 210 211Running 212~~~~~~~ 213 214The application does not have any specific command line options other than *EAL*: 215 216.. code-block:: console 217 218 ./build/vm_power_mgr [EAL options] 219 220The application requires exactly two cores to run, one core is dedicated to the CLI, 221while the other is dedicated to the channel endpoint monitor, for example to run 222on cores 0 & 1 on a system with 4 memory channels: 223 224.. code-block:: console 225 226 ./build/vm_power_mgr -l 0-1 -n 4 227 228After successful initialization the user is presented with VM Power Manager CLI: 229 230.. code-block:: console 231 232 vm_power> 233 234Virtual Machines can now be added to the VM Power Manager: 235 236.. code-block:: console 237 238 vm_power> add_vm {vm_name} 239 240When a {vm_name} is specified with the *add_vm* command a lookup is performed 241with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier 242to associate channels with a particular VM and for executing operations on a VM within the CLI. 243VMs do not have to be running in order to add them. 244 245A number of commands can be issued via the CLI in relation to VMs: 246 247 Remove a Virtual Machine identified by {vm_name} from the VM Power Manager. 248 249 .. code-block:: console 250 251 rm_vm {vm_name} 252 253 Add communication channels for the specified VM, the virtio channels must be enabled 254 in the VM configuration(qemu/libvirt) and the associated VM must be active. 255 {list} is a comma-separated list of channel numbers to add, using the keyword 'all' 256 will attempt to add all channels for the VM: 257 258 .. code-block:: console 259 260 add_channels {vm_name} {list}|all 261 262 Enable or disable the communication channels in {list}(comma-separated) 263 for the specified VM, alternatively list can be replaced with keyword 'all'. 264 Disabled channels will still receive packets on the host, however the commands 265 they specify will be ignored. Set status to 'enabled' to begin processing requests again: 266 267 .. code-block:: console 268 269 set_channel_status {vm_name} {list}|all enabled|disabled 270 271 Print to the CLI the information on the specified VM, the information 272 lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with 273 any communication channels associated with each VM, along with the status of each channel: 274 275 .. code-block:: console 276 277 show_vm {vm_name} 278 279 Set the binding of Virtual CPU on VM with name {vm_name} to the Physical CPU mask: 280 281 .. code-block:: console 282 283 set_pcpu_mask {vm_name} {vcpu} {pcpu} 284 285 Set the binding of Virtual CPU on VM to the Physical CPU: 286 287 .. code-block:: console 288 289 set_pcpu {vm_name} {vcpu} {pcpu} 290 291Manual control and inspection can also be carried in relation CPU frequency scaling: 292 293 Get the current frequency for each core specified in the mask: 294 295 .. code-block:: console 296 297 show_cpu_freq_mask {mask} 298 299 Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max: 300 301 .. code-block:: console 302 303 set_cpu_freq {core_mask} up|down|min|max 304 305 Get the current frequency for the specified core: 306 307 .. code-block:: console 308 309 show_cpu_freq {core_num} 310 311 Set the current frequency for the specified core by scaling up/down/min/max: 312 313 .. code-block:: console 314 315 set_cpu_freq {core_num} up|down|min|max 316 317There are also some command line parameters for enabling the out-of-band 318monitoring of branch ratio on cores doing busy polling via PMDs. 319 320 .. code-block:: console 321 322 --core-list {list of cores} 323 324 When this parameter is used, the list of cores specified will monitor the ratio 325 between branch hits and branch misses. A tightly polling PMD thread will have a 326 very low branch ratio, so the core frequency will be scaled down to the minimim 327 allowed value. When packets are received, the code path will alter, causing the 328 branch ratio to increase. When the ratio goes above the ratio threshold, the 329 core frequency will be scaled up to the maximum allowed value. 330 331 .. code-block:: console 332 333 --branch-ratio {ratio} 334 335 The branch ratio is a floating point number that specifies the threshold at which 336 to scale up or down for the given workload. The default branch ratio is 0.01, 337 and will need to be adjusted for different workloads. 338 339 340 341JSON API 342~~~~~~~~ 343 344In addition to the command line interface for host command and a virtio-serial 345interface for VM power policies, there is also a JSON interface through which 346power commands and policies can be sent. This functionality adds a dependency 347on the Jansson library, and the Jansson development package must be installed 348on the system before the JSON parsing functionality is included in the app. 349This is achieved by: 350 351 .. code-block:: javascript 352 353 apt-get install libjansson-dev 354 355The command and package name may be different depending on your operating 356system. It's worth noting that the app will successfully build without this 357package present, but a warning is shown during compilation, and the JSON 358parsing functionality will not be present in the app. 359 360Sending a command or policy to the power manager application is achieved by 361simply opening a fifo file, writing a JSON string to that fifo, and closing 362the file. 363 364The fifo is at /tmp/powermonitor/fifo 365 366The jason string can be a policy or instruction, and takes the following 367format: 368 369 .. code-block:: javascript 370 371 {"packet_type": { 372 "pair_1": value, 373 "pair_2": value 374 }} 375 376The 'packet_type' header can contain one of two values, depending on 377whether a policy or power command is being sent. The two possible values are 378"policy" and "instruction", and the expected name-value pairs is different 379depending on which type is being sent. 380 381The pairs are the format of standard JSON name-value pairs. The value type 382varies between the different name/value pairs, and may be integers, strings, 383arrays, etc. Examples of policies follow later in this document. The allowed 384names and value types are as follows: 385 386 387:Pair Name: "name" 388:Description: Name of the VM or Host. Allows the parser to associate the 389 policy with the relevant VM or Host OS. 390:Type: string 391:Values: any valid string 392:Required: yes 393:Example: 394 395 .. code-block:: javascript 396 397 "name", "ubuntu2" 398 399 400:Pair Name: "command" 401:Description: The type of packet we're sending to the power manager. We can be 402 creating or destroying a policy, or sending a direct command to adjust 403 the frequency of a core, similar to the command line interface. 404:Type: string 405:Values: 406 407 :CREATE: used when creating a new policy, 408 :DESTROY: used when removing a policy, 409 :POWER: used when sending an immediate command, max, min, etc. 410:Required: yes 411:Example: 412 413 .. code-block:: javascript 414 415 "command", "CREATE" 416 417 418:Pair Name: "policy_type" 419:Description: Type of policy to apply. Please see vm_power_manager documentation 420 for more information on the types of policies that may be used. 421:Type: string 422:Values: 423 424 :TIME: Time-of-day policy. Frequencies of the relevant cores are 425 scaled up/down depending on busy and quiet hours. 426 :TRAFFIC: This policy takes statistics from the NIC and scales up 427 and down accordingly. 428 :WORKLOAD: This policy looks at how heavily loaded the cores are, 429 and scales up and down accordingly. 430 :BRANCH_RATIO: This out-of-band policy can look at the ratio between 431 branch hits and misses on a core, and is useful for detecting 432 how much packet processing a core is doing. 433:Required: only for CREATE/DESTROY command 434:Example: 435 436 .. code-block:: javascript 437 438 "policy_type", "TIME" 439 440:Pair Name: "busy_hours" 441:Description: The hours of the day in which we scale up the cores for busy 442 times. 443:Type: array of integers 444:Values: array with list of hour numbers, (0-23) 445:Required: only for TIME policy 446:Example: 447 448 .. code-block:: javascript 449 450 "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ] 451 452:Pair Name: "quiet_hours" 453:Description: The hours of the day in which we scale down the cores for quiet 454 times. 455:Type: array of integers 456:Values: array with list of hour numbers, (0-23) 457:Required: only for TIME policy 458:Example: 459 460 .. code-block:: javascript 461 462 "quiet_hours":[ 2, 3, 4, 5, 6 ] 463 464:Pair Name: "avg_packet_thresh" 465:Description: Threshold below which the frequency will be set to min for 466 the TRAFFIC policy. If the traffic rate is above this and below max, the 467 frequency will be set to medium. 468:Type: integer 469:Values: The number of packets below which the TRAFFIC policy applies the 470 minimum frequency, or medium frequency if between avg and max thresholds. 471:Required: only for TRAFFIC policy 472:Example: 473 474 .. code-block:: javascript 475 476 "avg_packet_thresh": 100000 477 478:Pair Name: "max_packet_thresh" 479:Description: Threshold above which the frequency will be set to max for 480 the TRAFFIC policy 481:Type: integer 482:Values: The number of packets per interval above which the TRAFFIC policy 483 applies the maximum frequency 484:Required: only for TRAFFIC policy 485:Example: 486 487 .. code-block:: javascript 488 489 "max_packet_thresh": 500000 490 491:Pair Name: "core_list" 492:Description: The cores to which to apply the policy. 493:Type: array of integers 494:Values: array with list of virtual CPUs. 495:Required: only policy CREATE/DESTROY 496:Example: 497 498 .. code-block:: javascript 499 500 "core_list":[ 10, 11 ] 501 502:Pair Name: "workload" 503:Description: When our policy is of type WORKLOAD, we need to specify how 504 heavy our workload is. 505:Type: string 506:Values: 507 508 :HIGH: For cores running workloads that require high frequencies 509 :MEDIUM: For cores running workloads that require medium frequencies 510 :LOW: For cores running workloads that require low frequencies 511:Required: only for WORKLOAD policy types 512:Example: 513 514 .. code-block:: javascript 515 516 "workload", "MEDIUM" 517 518:Pair Name: "mac_list" 519:Description: When our policy is of type TRAFFIC, we need to specify the 520 MAC addresses that the host needs to monitor 521:Type: string 522:Values: array with a list of mac address strings. 523:Required: only for TRAFFIC policy types 524:Example: 525 526 .. code-block:: javascript 527 528 "mac_list":[ "de:ad:be:ef:01:01", "de:ad:be:ef:01:02" ] 529 530:Pair Name: "unit" 531:Description: the type of power operation to apply in the command 532:Type: string 533:Values: 534 535 :SCALE_MAX: Scale frequency of this core to maximum 536 :SCALE_MIN: Scale frequency of this core to minimum 537 :SCALE_UP: Scale up frequency of this core 538 :SCALE_DOWN: Scale down frequency of this core 539 :ENABLE_TURBO: Enable Turbo Boost for this core 540 :DISABLE_TURBO: Disable Turbo Boost for this core 541:Required: only for POWER instruction 542:Example: 543 544 .. code-block:: javascript 545 546 "unit", "SCALE_MAX" 547 548:Pair Name: "resource_id" 549:Description: The core to which to apply the power command. 550:Type: integer 551:Values: valid core id for VM or host OS. 552:Required: only POWER instruction 553:Example: 554 555 .. code-block:: javascript 556 557 "resource_id": 10 558 559JSON API Examples 560~~~~~~~~~~~~~~~~~ 561 562Profile create example: 563 564 .. code-block:: javascript 565 566 {"policy": { 567 "name": "ubuntu", 568 "command": "create", 569 "policy_type": "TIME", 570 "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ], 571 "quiet_hours":[ 2, 3, 4, 5, 6 ], 572 "core_list":[ 11 ] 573 }} 574 575Profile destroy example: 576 577 .. code-block:: javascript 578 579 {"profile": { 580 "name": "ubuntu", 581 "command": "destroy", 582 }} 583 584Power command example: 585 586 .. code-block:: javascript 587 588 {"command": { 589 "name": "ubuntu", 590 "unit": "SCALE_MAX", 591 "resource_id": 10 592 }} 593 594To send a JSON string to the Power Manager application, simply paste the 595example JSON string into a text file and cat it into the fifo: 596 597 .. code-block:: console 598 599 cat file.json >/tmp/powermonitor/fifo 600 601The console of the Power Manager application should indicate the command that 602was just received via the fifo. 603 604Compiling and Running the Guest Applications 605-------------------------------------------- 606 607l3fwd-power is one sample application that can be used with vm_power_manager. 608 609A guest CLI is also provided for validating the setup. 610 611For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the 612host application using the *add_channels* command on the host. This typically uses 613the following commands in the host application: 614 615.. code-block:: console 616 617 vm_power> add_vm vmname 618 vm_power> add_channels vmname all 619 vm_power> set_channel_status vmname all enabled 620 vm_power> show_vm vmname 621 622 623Compiling 624~~~~~~~~~ 625 626For information on compiling DPDK and the sample applications 627see :doc:`compiling`. 628 629For compiling and running l3fwd-power, see :doc:`l3_forward_power_man`. 630 631The application is located in the ``guest_cli`` sub-directory under ``vm_power_manager``. 632 633To build just the ``guest_vm_power_manager`` application: 634 635.. code-block:: console 636 637 export RTE_SDK=/path/to/rte_sdk 638 export RTE_TARGET=build 639 cd ${RTE_SDK}/examples/vm_power_manager/guest_cli/ 640 make 641 642Running 643~~~~~~~ 644 645The standard *EAL* command line parameters are required: 646 647.. code-block:: console 648 649 ./build/guest_vm_power_mgr [EAL options] -- [guest options] 650 651The guest example uses a channel for each lcore enabled. For example, 652to run on cores 0,1,2,3: 653 654.. code-block:: console 655 656 ./build/guest_vm_power_mgr -l 0-3 657 658Optionally, there is a list of command line parameter should the user wish to send a power 659policy down to the host application. These parameters are as follows: 660 661 .. code-block:: console 662 663 --vm-name {name of guest vm} 664 665 This parameter allows the user to change the Virtual Machine name passed down to the 666 host application via the power policy. The default is "ubuntu2" 667 668 .. code-block:: console 669 670 --vcpu-list {list vm cores} 671 672 A comma-separated list of cores in the VM that the user wants the host application to 673 monitor. The list of cores in any vm starts at zero, and these are mapped to the 674 physical cores by the host application once the policy is passed down. 675 Valid syntax includes individial cores '2,3,4', or a range of cores '2-4', or a 676 combination of both '1,3,5-7' 677 678 .. code-block:: console 679 680 --busy-hours {list of busy hours} 681 682 A comma-separated list of hours within which to set the core frequency to maximum. 683 Valid syntax includes individial hours '2,3,4', or a range of hours '2-4', or a 684 combination of both '1,3,5-7'. Valid hours are 0 to 23. 685 686 .. code-block:: console 687 688 --quiet-hours {list of quiet hours} 689 690 A comma-separated list of hours within which to set the core frequency to minimum. 691 Valid syntax includes individial hours '2,3,4', or a range of hours '2-4', or a 692 combination of both '1,3,5-7'. Valid hours are 0 to 23. 693 694 .. code-block:: console 695 696 --policy {policy type} 697 698 The type of policy. This can be one of the following values: 699 TRAFFIC - based on incoming traffic rates on the NIC. 700 TIME - busy/quiet hours policy. 701 BRANCH_RATIO - uses branch ratio counters to determine core busyness. 702 Not all parameters are needed for all policy types. For example, BRANCH_RATIO 703 only needs the vcpu-list parameter, not any of the hours. 704 705 706After successful initialization the user is presented with VM Power Manager Guest CLI: 707 708.. code-block:: console 709 710 vm_power(guest)> 711 712To change the frequency of a lcore, use the set_cpu_freq command. 713Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max. 714 715.. code-block:: console 716 717 set_cpu_freq {core_num} up|down|min|max 718 719To start the application and configure the power policy, and send it to the host: 720 721.. code-block:: console 722 723 ./build/guest_vm_power_mgr -l 0-3 -n 4 -- --vm-name=ubuntu --policy=BRANCH_RATIO --vcpu-list=2-4 724 725Once the VM Power Manager Guest CLI appears, issuing the 'send_policy now' command 726will send the policy to the host: 727 728.. code-block:: console 729 730 send_policy now 731 732Once the policy is sent to the host, the host application takes over the power monitoring 733of the specified cores in the policy. 734 735