xref: /dpdk/doc/guides/sample_app_ug/vm_power_management.rst (revision a63504a90f6aa55ae7aea204a8944cd9af9342bd)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2010-2014 Intel Corporation.
3
4VM Power Management Application
5===============================
6
7Introduction
8------------
9
10Applications running in Virtual Environments have an abstract view of
11the underlying hardware on the Host, in particular applications cannot see
12the binding of virtual to physical hardware.
13When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to
14Host Physical CPUs(pCPUS) is not apparent to an application
15and this pinning may change over time.
16Furthermore, Operating Systems on virtual machines do not have the ability
17to govern their own power policy; the Machine Specific Registers (MSRs)
18for enabling P-State transitions are not exposed to Operating Systems
19running on Virtual Machines(VMs).
20
21The Virtual Machine Power Management solution shows an example of
22how a DPDK application can indicate its processing requirements using VM local
23only information(vCPU/lcore, etc.) to a Host based Monitor which is responsible
24for accepting requests for frequency changes for a vCPU, translating the vCPU
25to a pCPU via libvirt and affecting the change in frequency.
26
27The solution is comprised of two high-level components:
28
29#. Example Host Application
30
31   Using a Command Line Interface(CLI) for VM->Host communication channel management
32   allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning,
33   inspecting and manually changing the frequency for each CPU.
34   The CLI runs on a single lcore while the thread responsible for managing
35   VM requests runs on a second lcore.
36
37   VM requests arriving on a channel for frequency changes are passed
38   to the librte_power ACPI cpufreq sysfs based library.
39   The Host Application relies on both qemu-kvm and libvirt to function.
40
41   This monitoring application is responsible for:
42
43   - Accepting requests from client applications: Client applications can
44     request frequency changes for a vCPU, translating
45     the vCPU to a pCPU via libvirt and affecting the change in frequency.
46
47   - Accepting policies from client applications: Client application can
48     send a policy to the host application. The
49     host application will then apply the rules of the policy independent
50     of the application. For example, the policy can contain time-of-day
51     information for busy/quiet periods, and the host application can scale
52     up/down the relevant cores when required. See the details of the guest
53     application below for more information on setting the policy values.
54
55   - Out-of-band monitoring of workloads via cores hardware event counters:
56     The host application can manage power for an application in a virtualised
57     OR non-virtualised environment by looking at the event counters of the
58     cores and taking action based on the branch hit/miss ratio. See the host
59     application '--core-list' command line parameter below.
60
61#. librte_power for Virtual Machines
62
63   Using an alternate implementation for the librte_power API, requests for
64   frequency changes are forwarded to the host monitor rather than
65   the APCI cpufreq sysfs interface used on the host.
66
67   The l3fwd-power application will use this implementation when deployed on a VM
68   (see :doc:`l3_forward_power_man`).
69
70.. _figure_vm_power_mgr_highlevel:
71
72.. figure:: img/vm_power_mgr_highlevel.*
73
74   Highlevel Solution
75
76
77Overview
78--------
79
80VM Power Management employs qemu-kvm to provide communications channels
81between the host and VMs in the form of Virtio-Serial which appears as
82a paravirtualized serial device on a VM and can be configured to use
83various backends on the host. For this example each Virtio-Serial endpoint
84on the host is configured as AF_UNIX file socket, supporting poll/select
85and epoll for event notification.
86In this example each channel endpoint on the host is monitored via
87epoll for EPOLLIN events.
88Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM,
89where each VM can have a number of channels up to a maximum of 64 per VM,
90in this example each DPDK lcore on a VM has exclusive access to a channel.
91
92To enable frequency changes from within a VM, a request via the librte_power interface
93is forwarded via Virtio-Serial to the host, each request contains the vCPU
94and power command(scale up/down/min/max).
95The API for host and guest librte_power is consistent across environments,
96with the selection of VM or Host Implementation determined at automatically
97at runtime based on the environment.
98
99Upon receiving a request, the host translates the vCPU to a pCPU via
100the libvirt API before forwarding to the host librte_power.
101
102.. _figure_vm_power_mgr_vm_request_seq:
103
104.. figure:: img/vm_power_mgr_vm_request_seq.*
105
106   VM request to scale frequency
107
108
109Performance Considerations
110~~~~~~~~~~~~~~~~~~~~~~~~~~
111
112While Haswell Microarchitecture allows for independent power control for each core,
113earlier Microarchtectures do not offer such fine grained control.
114When deployed on pre-Haswell platforms greater care must be taken in selecting
115which cores are assigned to a VM, for instance a core will not scale down
116until its sibling is similarly scaled.
117
118Configuration
119-------------
120
121BIOS
122~~~~
123
124Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS
125if the power management feature of DPDK is to be used.
126Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist,
127and the CPU frequency-based power management cannot be used.
128Consult the relevant BIOS documentation to determine how these settings
129can be accessed.
130
131Host Operating System
132~~~~~~~~~~~~~~~~~~~~~
133
134The Host OS must also have the *apci_cpufreq* module installed, in some cases
135the *intel_pstate* driver may be the default Power Management environment.
136To enable *acpi_cpufreq* and disable *intel_pstate*, add the following
137to the grub Linux command line:
138
139.. code-block:: console
140
141  intel_pstate=disable
142
143Upon rebooting, load the *acpi_cpufreq* module:
144
145.. code-block:: console
146
147  modprobe acpi_cpufreq
148
149Hypervisor Channel Configuration
150~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
151
152Virtio-Serial channels are configured via libvirt XML:
153
154
155.. code-block:: xml
156
157  <name>{vm_name}</name>
158  <controller type='virtio-serial' index='0'>
159    <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
160  </controller>
161  <channel type='unix'>
162    <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/>
163    <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}'/>
164    <address type='virtio-serial' controller='0' bus='0' port='{N}'/>
165  </channel>
166
167
168Where a single controller of type *virtio-serial* is created and up to 32 channels
169can be associated with a single controller and multiple controllers can be specified.
170The convention is to use the name of the VM in the host path *{vm_name}* and
171to increment *{channel_num}* for each channel, likewise the port value *{N}*
172must be incremented for each channel.
173
174Each channel on the host will appear in *path*, the directory */tmp/powermonitor/*
175must first be created and given qemu permissions
176
177.. code-block:: console
178
179  mkdir /tmp/powermonitor/
180  chown qemu:qemu /tmp/powermonitor
181
182Note that files and directories within /tmp are generally removed upon
183rebooting the host and the above steps may need to be carried out after each reboot.
184
185The serial device as it appears on a VM is configured with the *target* element attribute *name*
186and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*,
187where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications.
188
189Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}*
190
191Compiling and Running the Host Application
192------------------------------------------
193
194Compiling
195~~~~~~~~~
196
197For information on compiling DPDK and the sample applications
198see :doc:`compiling`.
199
200The application is located in the ``vm_power_manager`` sub-directory.
201
202To build just the ``vm_power_manager`` application:
203
204.. code-block:: console
205
206  export RTE_SDK=/path/to/rte_sdk
207  export RTE_TARGET=build
208  cd ${RTE_SDK}/examples/vm_power_manager/
209  make
210
211Running
212~~~~~~~
213
214The application does not have any specific command line options other than *EAL*:
215
216.. code-block:: console
217
218 ./build/vm_power_mgr [EAL options]
219
220The application requires exactly two cores to run, one core is dedicated to the CLI,
221while the other is dedicated to the channel endpoint monitor, for example to run
222on cores 0 & 1 on a system with 4 memory channels:
223
224.. code-block:: console
225
226 ./build/vm_power_mgr -l 0-1 -n 4
227
228After successful initialization the user is presented with VM Power Manager CLI:
229
230.. code-block:: console
231
232  vm_power>
233
234Virtual Machines can now be added to the VM Power Manager:
235
236.. code-block:: console
237
238  vm_power> add_vm {vm_name}
239
240When a {vm_name} is specified with the *add_vm* command a lookup is performed
241with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier
242to associate channels with a particular VM and for executing operations on a VM within the CLI.
243VMs do not have to be running in order to add them.
244
245A number of commands can be issued via the CLI in relation to VMs:
246
247  Remove a Virtual Machine identified by {vm_name} from the VM Power Manager.
248
249  .. code-block:: console
250
251    rm_vm {vm_name}
252
253  Add communication channels for the specified VM, the virtio channels must be enabled
254  in the VM configuration(qemu/libvirt) and the associated VM must be active.
255  {list} is a comma-separated list of channel numbers to add, using the keyword 'all'
256  will attempt to add all channels for the VM:
257
258  .. code-block:: console
259
260    add_channels {vm_name} {list}|all
261
262  Enable or disable the communication channels in {list}(comma-separated)
263  for the specified VM, alternatively list can be replaced with keyword 'all'.
264  Disabled channels will still receive packets on the host, however the commands
265  they specify will be ignored. Set status to 'enabled' to begin processing requests again:
266
267  .. code-block:: console
268
269    set_channel_status {vm_name} {list}|all enabled|disabled
270
271  Print to the CLI the information on the specified VM, the information
272  lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with
273  any communication channels associated with each VM, along with the status of each channel:
274
275  .. code-block:: console
276
277    show_vm {vm_name}
278
279  Set the binding of Virtual CPU on VM with name {vm_name}  to the Physical CPU mask:
280
281  .. code-block:: console
282
283    set_pcpu_mask {vm_name} {vcpu} {pcpu}
284
285  Set the binding of Virtual CPU on VM to the Physical CPU:
286
287  .. code-block:: console
288
289    set_pcpu {vm_name} {vcpu} {pcpu}
290
291Manual control and inspection can also be carried in relation CPU frequency scaling:
292
293  Get the current frequency for each core specified in the mask:
294
295  .. code-block:: console
296
297    show_cpu_freq_mask {mask}
298
299  Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max:
300
301  .. code-block:: console
302
303    set_cpu_freq {core_mask} up|down|min|max
304
305  Get the current frequency for the specified core:
306
307  .. code-block:: console
308
309    show_cpu_freq {core_num}
310
311  Set the current frequency for the specified core by scaling up/down/min/max:
312
313  .. code-block:: console
314
315    set_cpu_freq {core_num} up|down|min|max
316
317There are also some command line parameters for enabling the out-of-band
318monitoring of branch ratio on cores doing busy polling via PMDs.
319
320  .. code-block:: console
321
322    --core-list {list of cores}
323
324  When this parameter is used, the list of cores specified will monitor the ratio
325  between branch hits and branch misses. A tightly polling PMD thread will have a
326  very low branch ratio, so the core frequency will be scaled down to the minimim
327  allowed value. When packets are received, the code path will alter, causing the
328  branch ratio to increase. When the ratio goes above the ratio threshold, the
329  core frequency will be scaled up to the maximum allowed value.
330
331  .. code-block:: console
332
333    --branch-ratio {ratio}
334
335  The branch ratio is a floating point number that specifies the threshold at which
336  to scale up or down for the given workload. The default branch ratio is 0.01,
337  and will need to be adjusted for different workloads.
338
339
340
341JSON API
342~~~~~~~~
343
344In addition to the command line interface for host command and a virtio-serial
345interface for VM power policies, there is also a JSON interface through which
346power commands and policies can be sent. This functionality adds a dependency
347on the Jansson library, and the Jansson development package must be installed
348on the system before the JSON parsing functionality is included in the app.
349This is achieved by:
350
351  .. code-block:: javascript
352
353    apt-get install libjansson-dev
354
355The command and package name may be different depending on your operating
356system. It's worth noting that the app will successfully build without this
357package present, but a warning is shown during compilation, and the JSON
358parsing functionality will not be present in the app.
359
360Sending a command or policy to the power manager application is achieved by
361simply opening a fifo file, writing a JSON string to that fifo, and closing
362the file.
363
364The fifo is at /tmp/powermonitor/fifo
365
366The jason string can be a policy or instruction, and takes the following
367format:
368
369  .. code-block:: javascript
370
371    {"packet_type": {
372      "pair_1": value,
373      "pair_2": value
374    }}
375
376The 'packet_type' header can contain one of two values, depending on
377whether a policy or power command is being sent. The two possible values are
378"policy" and "instruction", and the expected name-value pairs is different
379depending on which type is being sent.
380
381The pairs are the format of standard JSON name-value pairs. The value type
382varies between the different name/value pairs, and may be integers, strings,
383arrays, etc. Examples of policies follow later in this document. The allowed
384names and value types are as follows:
385
386
387:Pair Name: "name"
388:Description: Name of the VM or Host. Allows the parser to associate the
389  policy with the relevant VM or Host OS.
390:Type: string
391:Values: any valid string
392:Required: yes
393:Example:
394
395    .. code-block:: javascript
396
397      "name", "ubuntu2"
398
399
400:Pair Name: "command"
401:Description: The type of packet we're sending to the power manager. We can be
402  creating or destroying a policy, or sending a direct command to adjust
403  the frequency of a core, similar to the command line interface.
404:Type: string
405:Values:
406
407  :CREATE: used when creating a new policy,
408  :DESTROY: used when removing a policy,
409  :POWER: used when sending an immediate command, max, min, etc.
410:Required: yes
411:Example:
412
413    .. code-block:: javascript
414
415      "command", "CREATE"
416
417
418:Pair Name: "policy_type"
419:Description: Type of policy to apply. Please see vm_power_manager documentation
420  for more information on the types of policies that may be used.
421:Type: string
422:Values:
423
424  :TIME: Time-of-day policy. Frequencies of the relevant cores are
425    scaled up/down depending on busy and quiet hours.
426  :TRAFFIC: This policy takes statistics from the NIC and scales up
427    and down accordingly.
428  :WORKLOAD: This policy looks at how heavily loaded the cores are,
429    and scales up and down accordingly.
430  :BRANCH_RATIO: This out-of-band policy can look at the ratio between
431    branch hits and misses on a core, and is useful for detecting
432    how much packet processing a core is doing.
433:Required: only for CREATE/DESTROY command
434:Example:
435
436  .. code-block:: javascript
437
438    "policy_type", "TIME"
439
440:Pair Name: "busy_hours"
441:Description: The hours of the day in which we scale up the cores for busy
442  times.
443:Type: array of integers
444:Values: array with list of hour numbers, (0-23)
445:Required: only for TIME policy
446:Example:
447
448  .. code-block:: javascript
449
450    "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ]
451
452:Pair Name: "quiet_hours"
453:Description: The hours of the day in which we scale down the cores for quiet
454  times.
455:Type: array of integers
456:Values: array with list of hour numbers, (0-23)
457:Required: only for TIME policy
458:Example:
459
460  .. code-block:: javascript
461
462    "quiet_hours":[ 2, 3, 4, 5, 6 ]
463
464:Pair Name: "avg_packet_thresh"
465:Description: Threshold below which the frequency will be set to min for
466  the TRAFFIC policy. If the traffic rate is above this and below max, the
467  frequency will be set to medium.
468:Type: integer
469:Values: The number of packets below which the TRAFFIC policy applies the
470  minimum frequency, or medium frequency if between avg and max thresholds.
471:Required: only for TRAFFIC policy
472:Example:
473
474  .. code-block:: javascript
475
476    "avg_packet_thresh": 100000
477
478:Pair Name: "max_packet_thresh"
479:Description: Threshold above which the frequency will be set to max for
480  the TRAFFIC policy
481:Type: integer
482:Values: The number of packets per interval above which the TRAFFIC policy
483  applies the maximum frequency
484:Required: only for TRAFFIC policy
485:Example:
486
487  .. code-block:: javascript
488
489    "max_packet_thresh": 500000
490
491:Pair Name: "core_list"
492:Description: The cores to which to apply the policy.
493:Type: array of integers
494:Values: array with list of virtual CPUs.
495:Required: only policy CREATE/DESTROY
496:Example:
497
498  .. code-block:: javascript
499
500    "core_list":[ 10, 11 ]
501
502:Pair Name: "workload"
503:Description: When our policy is of type WORKLOAD, we need to specify how
504  heavy our workload is.
505:Type: string
506:Values:
507
508  :HIGH: For cores running workloads that require high frequencies
509  :MEDIUM: For cores running workloads that require medium frequencies
510  :LOW: For cores running workloads that require low frequencies
511:Required: only for WORKLOAD policy types
512:Example:
513
514  .. code-block:: javascript
515
516    "workload", "MEDIUM"
517
518:Pair Name: "mac_list"
519:Description: When our policy is of type TRAFFIC, we need to specify the
520  MAC addresses that the host needs to monitor
521:Type: string
522:Values: array with a list of mac address strings.
523:Required: only for TRAFFIC policy types
524:Example:
525
526  .. code-block:: javascript
527
528    "mac_list":[ "de:ad:be:ef:01:01", "de:ad:be:ef:01:02" ]
529
530:Pair Name: "unit"
531:Description: the type of power operation to apply in the command
532:Type: string
533:Values:
534
535  :SCALE_MAX: Scale frequency of this core to maximum
536  :SCALE_MIN: Scale frequency of this core to minimum
537  :SCALE_UP: Scale up frequency of this core
538  :SCALE_DOWN: Scale down frequency of this core
539  :ENABLE_TURBO: Enable Turbo Boost for this core
540  :DISABLE_TURBO: Disable Turbo Boost for this core
541:Required: only for POWER instruction
542:Example:
543
544  .. code-block:: javascript
545
546    "unit", "SCALE_MAX"
547
548:Pair Name: "resource_id"
549:Description: The core to which to apply the power command.
550:Type: integer
551:Values: valid core id for VM or host OS.
552:Required: only POWER instruction
553:Example:
554
555  .. code-block:: javascript
556
557    "resource_id": 10
558
559JSON API Examples
560~~~~~~~~~~~~~~~~~
561
562Profile create example:
563
564  .. code-block:: javascript
565
566    {"policy": {
567      "name": "ubuntu",
568      "command": "create",
569      "policy_type": "TIME",
570      "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ],
571      "quiet_hours":[ 2, 3, 4, 5, 6 ],
572      "core_list":[ 11 ]
573    }}
574
575Profile destroy example:
576
577  .. code-block:: javascript
578
579    {"profile": {
580      "name": "ubuntu",
581      "command": "destroy",
582    }}
583
584Power command example:
585
586  .. code-block:: javascript
587
588    {"command": {
589      "name": "ubuntu",
590      "unit": "SCALE_MAX",
591      "resource_id": 10
592    }}
593
594To send a JSON string to the Power Manager application, simply paste the
595example JSON string into a text file and cat it into the fifo:
596
597  .. code-block:: console
598
599    cat file.json >/tmp/powermonitor/fifo
600
601The console of the Power Manager application should indicate the command that
602was just received via the fifo.
603
604Compiling and Running the Guest Applications
605--------------------------------------------
606
607l3fwd-power is one sample application that can be used with vm_power_manager.
608
609A guest CLI is also provided for validating the setup.
610
611For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the
612host application using the *add_channels* command on the host. This typically uses
613the following commands in the host application:
614
615.. code-block:: console
616
617  vm_power> add_vm vmname
618  vm_power> add_channels vmname all
619  vm_power> set_channel_status vmname all enabled
620  vm_power> show_vm vmname
621
622
623Compiling
624~~~~~~~~~
625
626For information on compiling DPDK and the sample applications
627see :doc:`compiling`.
628
629For compiling and running l3fwd-power, see :doc:`l3_forward_power_man`.
630
631The application is located in the ``guest_cli`` sub-directory under ``vm_power_manager``.
632
633To build just the ``guest_vm_power_manager`` application:
634
635.. code-block:: console
636
637  export RTE_SDK=/path/to/rte_sdk
638  export RTE_TARGET=build
639  cd ${RTE_SDK}/examples/vm_power_manager/guest_cli/
640  make
641
642Running
643~~~~~~~
644
645The standard *EAL* command line parameters are required:
646
647.. code-block:: console
648
649 ./build/guest_vm_power_mgr [EAL options] -- [guest options]
650
651The guest example uses a channel for each lcore enabled. For example,
652to run on cores 0,1,2,3:
653
654.. code-block:: console
655
656 ./build/guest_vm_power_mgr -l 0-3
657
658Optionally, there is a list of command line parameter should the user wish to send a power
659policy down to the host application. These parameters are as follows:
660
661  .. code-block:: console
662
663    --vm-name {name of guest vm}
664
665  This parameter allows the user to change the Virtual Machine name passed down to the
666  host application via the power policy. The default is "ubuntu2"
667
668  .. code-block:: console
669
670    --vcpu-list {list vm cores}
671
672  A comma-separated list of cores in the VM that the user wants the host application to
673  monitor. The list of cores in any vm starts at zero, and these are mapped to the
674  physical cores by the host application once the policy is passed down.
675  Valid syntax includes individial cores '2,3,4', or a range of cores '2-4', or a
676  combination of both '1,3,5-7'
677
678  .. code-block:: console
679
680    --busy-hours {list of busy hours}
681
682  A comma-separated list of hours within which to set the core frequency to maximum.
683  Valid syntax includes individial hours '2,3,4', or a range of hours '2-4', or a
684  combination of both '1,3,5-7'. Valid hours are 0 to 23.
685
686  .. code-block:: console
687
688    --quiet-hours {list of quiet hours}
689
690  A comma-separated list of hours within which to set the core frequency to minimum.
691  Valid syntax includes individial hours '2,3,4', or a range of hours '2-4', or a
692  combination of both '1,3,5-7'. Valid hours are 0 to 23.
693
694  .. code-block:: console
695
696    --policy {policy type}
697
698  The type of policy. This can be one of the following values:
699  TRAFFIC - based on incoming traffic rates on the NIC.
700  TIME - busy/quiet hours policy.
701  BRANCH_RATIO - uses branch ratio counters to determine core busyness.
702  Not all parameters are needed for all policy types. For example, BRANCH_RATIO
703  only needs the vcpu-list parameter, not any of the hours.
704
705
706After successful initialization the user is presented with VM Power Manager Guest CLI:
707
708.. code-block:: console
709
710  vm_power(guest)>
711
712To change the frequency of a lcore, use the set_cpu_freq command.
713Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max.
714
715.. code-block:: console
716
717  set_cpu_freq {core_num} up|down|min|max
718
719To start the application and configure the power policy, and send it to the host:
720
721.. code-block:: console
722
723 ./build/guest_vm_power_mgr -l 0-3 -n 4 -- --vm-name=ubuntu --policy=BRANCH_RATIO --vcpu-list=2-4
724
725Once the VM Power Manager Guest CLI appears, issuing the 'send_policy now' command
726will send the policy to the host:
727
728.. code-block:: console
729
730  send_policy now
731
732Once the policy is sent to the host, the host application takes over the power monitoring
733of the specified cores in the policy.
734
735