xref: /dpdk/doc/guides/sample_app_ug/vm_power_management.rst (revision fea1d908d39989a27890b29b5c0ec94c85c8257b)
1..  BSD LICENSE
2    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31VM Power Management Application
32===============================
33
34Introduction
35------------
36
37Applications running in Virtual Environments have an abstract view of
38the underlying hardware on the Host, in particular applications cannot see
39the binding of virtual to physical hardware.
40When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to
41Host Physical CPUs(pCPUS) is not apparent to an application
42and this pinning may change over time.
43Furthermore, Operating Systems on virtual machines do not have the ability
44to govern their own power policy; the Machine Specific Registers (MSRs)
45for enabling P-State transitions are not exposed to Operating Systems
46running on Virtual Machines(VMs).
47
48The Virtual Machine Power Management solution shows an example of
49how a DPDK application can indicate its processing requirements using VM local
50only information(vCPU/lcore) to a Host based Monitor which is responsible
51for accepting requests for frequency changes for a vCPU, translating the vCPU
52to a pCPU via libvirt and affecting the change in frequency.
53
54The solution is comprised of two high-level components:
55
56#. Example Host Application
57
58   Using a Command Line Interface(CLI) for VM->Host communication channel management
59   allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning,
60   inspecting and manually changing the frequency for each CPU.
61   The CLI runs on a single lcore while the thread responsible for managing
62   VM requests runs on a second lcore.
63
64   VM requests arriving on a channel for frequency changes are passed
65   to the librte_power ACPI cpufreq sysfs based library.
66   The Host Application relies on both qemu-kvm and libvirt to function.
67
68#. librte_power for Virtual Machines
69
70   Using an alternate implementation for the librte_power API, requests for
71   frequency changes are forwarded to the host monitor rather than
72   the APCI cpufreq sysfs interface used on the host.
73
74   The l3fwd-power application will use this implementation when deployed on a VM
75   (see Chapter 11 "L3 Forwarding with Power Management Application").
76
77.. _figure_24:
78
79**Figure 24. Highlevel Solution**
80
81|vm_power_mgr_highlevel|
82
83Overview
84--------
85
86VM Power Management employs qemu-kvm to provide communications channels
87between the host and VMs in the form of Virtio-Serial which appears as
88a paravirtualized serial device on a VM and can be configured to use
89various backends on the host. For this example each Virtio-Serial endpoint
90on the host is configured as AF_UNIX file socket, supporting poll/select
91and epoll for event notification.
92In this example each channel endpoint on the host is monitored via
93epoll for EPOLLIN events.
94Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM,
95where each VM can have a number of channels up to a maximum of 64 per VM,
96in this example each DPDK lcore on a VM has exclusive access to a channel.
97
98To enable frequency changes from within a VM, a request via the librte_power interface
99is forwarded via Virtio-Serial to the host, each request contains the vCPU
100and power command(scale up/down/min/max).
101The API for host and guest librte_power is consistent across environments,
102with the selection of VM or Host Implementation determined at automatically
103at runtime based on the environment.
104
105Upon receiving a request, the host translates the vCPU to a pCPU via
106the libvirt API before forwarding to the host librte_power.
107
108.. _figure_25:
109
110**Figure 25. VM request to scale frequency**
111
112|vm_power_mgr_vm_request_seq|
113
114Performance Considerations
115~~~~~~~~~~~~~~~~~~~~~~~~~~
116
117While Haswell Microarchitecture allows for independent power control for each core,
118earlier Microarchtectures do not offer such fine grained control.
119When deployed on pre-Haswell platforms greater care must be taken in selecting
120which cores are assigned to a VM, for instance a core will not scale down
121until its sibling is similarly scaled.
122
123Configuration
124-------------
125
126BIOS
127~~~~
128
129Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS
130if the power management feature of DPDK is to be used.
131Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist,
132and the CPU frequency-based power management cannot be used.
133Consult the relevant BIOS documentation to determine how these settings
134can be accessed.
135
136Host Operating System
137~~~~~~~~~~~~~~~~~~~~~
138
139The Host OS must also have the *apci_cpufreq* module installed, in some cases
140the *intel_pstate* driver may be the default Power Management environment.
141To enable *acpi_cpufreq* and disable *intel_pstate*, add the following
142to the grub Linux command line:
143
144.. code-block:: console
145
146  intel_pstate=disable
147
148Upon rebooting, load the *acpi_cpufreq* module:
149
150.. code-block:: console
151
152  modprobe acpi_cpufreq
153
154Hypervisor Channel Configuration
155~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156
157Virtio-Serial channels are configured via libvirt XML:
158
159
160.. code-block:: xml
161
162  <name>{vm_name}</name>
163  <controller type='virtio-serial' index='0'>
164    <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
165  </controller>
166  <channel type='unix'>
167    <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/>
168    <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}/>
169    <address type='virtio-serial' controller='0' bus='0' port='{N}'/>
170  </channel>
171
172
173Where a single controller of type *virtio-serial* is created and up to 32 channels
174can be associated with a single controller and multiple controllers can be specified.
175The convention is to use the name of the VM in the host path *{vm_name}* and
176to increment *{channel_num}* for each channel, likewise the port value *{N}*
177must be incremented for each channel.
178
179Each channel on the host will appear in *path*, the directory */tmp/powermonitor/*
180must first be created and given qemu permissions
181
182.. code-block:: console
183
184  mkdir /tmp/powermonitor/
185  chown qemu:qemu /tmp/powermonitor
186
187Note that files and directories within /tmp are generally removed upon
188rebooting the host and the above steps may need to be carried out after each reboot.
189
190The serial device as it appears on a VM is configured with the *target* element attribute *name*
191and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*,
192where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications.
193
194Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}*
195
196Compiling and Running the Host Application
197------------------------------------------
198
199Compiling
200~~~~~~~~~
201
202#. export RTE_SDK=/path/to/rte_sdk
203#. cd ${RTE_SDK}/examples/vm_power_manager
204#. make
205
206Running
207~~~~~~~
208
209The application does not have any specific command line options other than *EAL*:
210
211.. code-block:: console
212
213 ./build/vm_power_mgr [EAL options]
214
215The application requires exactly two cores to run, one core is dedicated to the CLI,
216while the other is dedicated to the channel endpoint monitor, for example to run
217on cores 0 & 1 on a system with 4 memory channels:
218
219.. code-block:: console
220
221 ./build/vm_power_mgr -c 0x3 -n 4
222
223After successful initialization the user is presented with VM Power Manager CLI:
224
225.. code-block:: console
226
227  vm_power>
228
229Virtual Machines can now be added to the VM Power Manager:
230
231.. code-block:: console
232
233  vm_power> add_vm {vm_name}
234
235When a {vm_name} is specified with the *add_vm* command a lookup is performed
236with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier
237to associate channels with a particular VM and for executing operations on a VM within the CLI.
238VMs do not have to be running in order to add them.
239
240A number of commands can be issued via the CLI in relation to VMs:
241
242  Remove a Virtual Machine identified by {vm_name} from the VM Power Manager.
243
244  .. code-block:: console
245
246    rm_vm {vm_name}
247
248  Add communication channels for the specified VM, the virtio channels must be enabled
249  in the VM configuration(qemu/libvirt) and the associated VM must be active.
250  {list} is a comma-separated list of channel numbers to add, using the keyword 'all'
251  will attempt to add all channels for the VM:
252
253  .. code-block:: console
254
255    add_channels {vm_name} {list}|all
256
257  Enable or disable the communication channels in {list}(comma-separated)
258  for the specified VM, alternatively list can be replaced with keyword 'all'.
259  Disabled channels will still receive packets on the host, however the commands
260  they specify will be ignored. Set status to 'enabled' to begin processing requests again:
261
262  .. code-block:: console
263
264    set_channel_status {vm_name} {list}|all enabled|disabled
265
266  Print to the CLI the information on the specified VM, the information
267  lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with
268  any communication channels associated with each VM, along with the status of each channel:
269
270  .. code-block:: console
271
272    show_vm {vm_name}
273
274  Set the binding of Virtual CPU on VM with name {vm_name}  to the Physical CPU mask:
275
276  .. code-block:: console
277
278    set_pcpu_mask {vm_name} {vcpu} {pcpu}
279
280  Set the binding of Virtual CPU on VM to the Physical CPU:
281
282  .. code-block:: console
283
284    set_pcpu {vm_name} {vcpu} {pcpu}
285
286Manual control and inspection can also be carried in relation CPU frequency scaling:
287
288  Get the current frequency for each core specified in the mask:
289
290  .. code-block:: console
291
292    show_cpu_freq_mask {mask}
293
294  Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max:
295
296  .. code-block:: console
297
298    set_cpu_freq {core_mask} up|down|min|max
299
300  Get the current frequency for the specified core:
301
302  .. code-block:: console
303
304    show_cpu_freq {core_num}
305
306  Set the current frequency for the specified core by scaling up/down/min/max:
307
308  .. code-block:: console
309
310    set_cpu_freq {core_num} up|down|min|max
311
312Compiling and Running the Guest Applications
313--------------------------------------------
314
315For compiling and running l3fwd-power, see Chapter 11 "L3 Forwarding with Power Management Application".
316
317A guest CLI is also provided for validating the setup.
318
319For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the
320host application using the *add_channels* command on the host.
321
322Compiling
323~~~~~~~~~
324
325#. export RTE_SDK=/path/to/rte_sdk
326#. cd ${RTE_SDK}/examples/vm_power_manager/guest_cli
327#. make
328
329Running
330~~~~~~~
331
332The application does not have any specific command line options other than *EAL*:
333
334.. code-block:: console
335
336 ./build/vm_power_mgr [EAL options]
337
338The application for example purposes uses a channel for each lcore enabled,
339for example to run on cores 0,1,2,3 on a system with 4 memory channels:
340
341.. code-block:: console
342
343 ./build/guest_vm_power_mgr -c 0xf -n 4
344
345
346After successful initialization the user is presented with VM Power Manager Guest CLI:
347
348.. code-block:: console
349
350  vm_power(guest)>
351
352To change the frequency of a lcore, use the set_cpu_freq command.
353Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max.
354
355.. code-block:: console
356
357  set_cpu_freq {core_num} up|down|min|max
358
359.. |vm_power_mgr_highlevel| image:: img/vm_power_mgr_highlevel.*
360
361.. |vm_power_mgr_vm_request_seq| image:: img/vm_power_mgr_vm_request_seq.*
362