xref: /spdk/doc/scheduler.md (revision bdf42664d13c892cc1b23bd0dac109b69a63bb05)
1c15af452STomasz Zawadzki# Scheduler {#scheduler}
2c15af452STomasz Zawadzki
3c15af452STomasz ZawadzkiSPDK's event/application framework (`lib/event`) now supports scheduling of
4c15af452STomasz Zawadzkilightweight threads. Schedulers are provided as plugins, called
5c15af452STomasz Zawadzkiimplementations. A default implementation is provided, but users may wish to
6c15af452STomasz Zawadzkiwrite their own scheduler to integrate into broader code frameworks or meet
7c15af452STomasz Zawadzkitheir performance needs.
8c15af452STomasz Zawadzki
9c15af452STomasz ZawadzkiThis feature should be considered experimental and is disabled by default. When
10c15af452STomasz Zawadzkienabled, the scheduler framework gathers data for each spdk thread and reactor
11c15af452STomasz Zawadzkiand passes it to a scheduler implementation to perform one of the following
12c15af452STomasz Zawadzkiactions.
13c15af452STomasz Zawadzki
14c15af452STomasz Zawadzki## Actions
15c15af452STomasz Zawadzki
16c15af452STomasz Zawadzki### Move a thread
17c15af452STomasz Zawadzki
18c15af452STomasz Zawadzki`spdk_thread`s can be moved to another reactor. Schedulers can examine the
19c15af452STomasz Zawadzkisuggested cpu_mask value for each lightweight thread to see if the user has
20c15af452STomasz Zawadzkirequested specific reactors, or choose a reactor using whatever algorithm they
21c15af452STomasz Zawadzkideem fit.
22c15af452STomasz Zawadzki
23c15af452STomasz Zawadzki### Switch reactor mode
24c15af452STomasz Zawadzki
25c15af452STomasz ZawadzkiReactors by default run in a mode that constantly polls for new actions for the
26c15af452STomasz Zawadzkimost efficient processing. Schedulers can switch a reactor into a mode that
27c15af452STomasz Zawadzkiinstead waits for an event on a file descriptor. On Linux, this is implemented
28c15af452STomasz Zawadzkiusing epoll. This results in reduced CPU usage but may be less responsive when
29c15af452STomasz Zawadzkievents occur. A reactor cannot enter this mode if any `spdk_threads` are
30c15af452STomasz Zawadzkicurrently scheduled to it. This limitation is expected to be lifted in the
31c15af452STomasz Zawadzkifuture, allowing `spdk_threads` to enter interrupt mode.
32c15af452STomasz Zawadzki
33c15af452STomasz Zawadzki### Set frequency of CPU core
34c15af452STomasz Zawadzki
35c15af452STomasz ZawadzkiThe frequency of CPU cores can be modified by the scheduler in response to
36c15af452STomasz Zawadzkiload. Only CPU cores that match the application cpu_mask may be modified. The
37c15af452STomasz Zawadzkimechanism for controlling CPU frequency is pluggable and the default provided
38c15af452STomasz Zawadzkiimplementation is called `dpdk_governor`, based on the `rte_power` library from
39c15af452STomasz ZawadzkiDPDK.
40c15af452STomasz Zawadzki
41c15af452STomasz Zawadzki#### Known limitation
42c15af452STomasz Zawadzki
43c15af452STomasz ZawadzkiWhen SMT (Hyperthreading) is enabled the two logical CPU cores sharing a single
44c15af452STomasz Zawadzkiphysical CPU core must run at the same frequency. If one of two of such logical
45c15af452STomasz ZawadzkiCPU cores is outside the application cpu_mask, the policy and frequency on that
46c15af452STomasz Zawadzkicore has to be managed by the administrator.
47c15af452STomasz Zawadzki
48c15af452STomasz Zawadzki## Scheduler implementations
49c15af452STomasz Zawadzki
50c15af452STomasz ZawadzkiThe scheduler in use may be controlled by JSON-RPC. Please use the
51835494b5SKrzysztof Karas[framework_set_scheduler](jsonrpc.html#rpc_framework_set_scheduler) RPC to
52835494b5SKrzysztof Karasswitch between schedulers or change their options. Currently only dynamic
53835494b5SKrzysztof Karasscheduler supports changing its parameters.
54c15af452STomasz Zawadzki
55835494b5SKrzysztof Karas[spdk_top](spdk_top.html#spdk_top) is a useful tool to observe the behavior of
56c15af452STomasz Zawadzkischedulers in different scenarios and workloads.
57c15af452STomasz Zawadzki
58c15af452STomasz Zawadzki### static [default]
59c15af452STomasz Zawadzki
60c15af452STomasz ZawadzkiThe `static` scheduler is the default scheduler and does no dynamic scheduling.
61c15af452STomasz ZawadzkiLightweight threads are distributed round-robin among reactors, respecting
62*bdf42664SKrzysztof Karastheir requested cpu_mask, only at application startup, and then they are never
63*bdf42664SKrzysztof Karasmoved. This is equivalent to the previous behavior of the SPDK event/application
64*bdf42664SKrzysztof Karasframework.
65*bdf42664SKrzysztof Karas
66*bdf42664SKrzysztof KarasThe `static` scheduler cannot be re-enabled after a different scheduler has been
67*bdf42664SKrzysztof Karasselected, because currently there is no way to save original SPDK thread distribution
68*bdf42664SKrzysztof Karasconfiguration.
69c15af452STomasz Zawadzki
70c15af452STomasz Zawadzki### dynamic
71c15af452STomasz Zawadzki
72c15af452STomasz ZawadzkiThe `dynamic` scheduler is designed for power saving and reduction of CPU
73c15af452STomasz Zawadzkiutilization, especially in cases where workloads show large variations over
74835494b5SKrzysztof Karastime. In SPDK thread and core workloads are measured in CPU ticks. Those
75835494b5SKrzysztof Karasvalues are then compared with all the ticks since the last check, which allows
76835494b5SKrzysztof Karasto calculate `busy time`.
77835494b5SKrzysztof Karas
78835494b5SKrzysztof Karas`busy time = busy ticks / (busy tick + idle tick) * 100 %`
79835494b5SKrzysztof Karas
80835494b5SKrzysztof KarasThe thread is considered to be active, if its busy time is over the `load limit`
81835494b5SKrzysztof Karasparameter.
82c15af452STomasz Zawadzki
83c15af452STomasz ZawadzkiActive threads are distributed equally among reactors, taking cpu_mask into
84c15af452STomasz Zawadzkiaccount. All idle threads are moved to the main core. Once an idle thread becomes
85835494b5SKrzysztof Karasactive, it is redistributed again. Dynamic scheduler monitors core workloads and
86835494b5SKrzysztof Karasredistributes SPDK threads on cores in a way that none of them is over `core limit`.
87835494b5SKrzysztof KarasIn case a core utilization surpasses this threshold, scheduler should move threads
88835494b5SKrzysztof Karasout of it until this condition no longer applies. Cores might also be in overloaded
89835494b5SKrzysztof Karasstate, which indicates that moving threads out of this core will not decrease its
90835494b5SKrzysztof Karasutilization under the `core limit` and the threads are unable to process all the I/O
91835494b5SKrzysztof Karasthey are capable of, because they share CPU ticks with other threads. The threshold
92835494b5SKrzysztof Karasto decide if a core is overloaded is called `core busy`. Note that threads residing
93835494b5SKrzysztof Karason an overloaded core will not perform as good as other threads, because the CPU ticks
94835494b5SKrzysztof Karasintended for them are limited by other threads on the same core.
95c15af452STomasz Zawadzki
96c15af452STomasz ZawadzkiWhen a reactor has no scheduled `spdk_thread`s it is switched into interrupt
97c15af452STomasz Zawadzkimode and stops actively polling. After enough threads become active, the
98c15af452STomasz Zawadzkireactor is switched back into poll mode and threads are assigned to it again.
99c15af452STomasz Zawadzki
100c15af452STomasz ZawadzkiThe main core can contain active threads only when their execution time does
101c15af452STomasz Zawadzkinot exceed the sum of all idle threads. When no active threads are present on
102c15af452STomasz Zawadzkithe main core, the frequency of that CPU core will decrease as the load
103c15af452STomasz Zawadzkidecreases. All CPU cores corresponding to the other reactors remain at maximum
104c15af452STomasz Zawadzkifrequency.
105835494b5SKrzysztof Karas
106835494b5SKrzysztof KarasThe dynamic scheduler is currently the only one that allows manual setting of
107835494b5SKrzysztof Karasits parameters.
108835494b5SKrzysztof Karas
109835494b5SKrzysztof KarasCurrent values of scheduler parameters can be displayed by using
110835494b5SKrzysztof Karas[framework_get_scheduler](jsonrpc.html#rpc_framework_get_scheduler) RPC.
111