25 Aug 2006 21:50 Subscribe

CPU frequency scaling in HAL

Yesterday, my HAL addon which cares about CPU frequency scaling got committed to the HAL git master branch. Finally, after one and a half month of discussions, new patches, and no responses.

It's main goal is to unify the way linux distributions are doing CPU frequency scaling. At the moment there are different daemons out there which are doing the job. powernowd, cpuspeed or powersaved. Powersaved currently is the only one which supports both kernel governors (powersave, performance and ondemand) and has a userspace governor implementation. All this is wrapped around by a nice DBus interface to make it easy controllable by higher level applications like desktop GUIs (e.g. kpowersave).

But unfortunately powersaved didn't get enough acceptance in the community, for whatever reasons. And my main goal is to improve power management in linux. So the only solution which seemed to be accepted by the GNOME, the KDE and the general community was to put CPU frequency scaling functionality into HAL.

In the end, this was no bad idea. This decision forced me reconsider some parts of the present implementation and design which we have in powersaved. And I really had a great idea. Maybe the best idea I ever had. It's a real masterpiece ;-)

So in future, it comes along with most modern distributions. And hopefully everybody will use it ;-) At least the gnome power manager maintainer already states his willingness to make use of it in his blog.

The following explanations should be the documentation for the CPUFreq addon. It summarizes what has been discussed on the HAL and powersave-devel mailinglist during the last weeks and what's the addon all about. It should point out answers to the common questions which may come up. If anything else is not clear, please don't hesitate to send me an email or add a comment.

Available methods on the interface org.freedesktop.Hal.Device.CPUFreq.

Method: SetCPUFreqGovernor (string)

Parameters: The name of the governor to set. Get a list of available governors with the GetCPUFreqAvailableGovernors method.

Sets a CPU frequency scaling governor for all CPUFreq interfaces the kernel provides. If the userspace governor is set, this interface also contains a proper scaling mechanism. The default performance is set to

Method: SetCPUFreqPerformance (integer)

Parameters: The performance between 1 and 100 to set in dynamic scaling modes.

Sets the performance of the dynamic scaling mechanism. This method summarizes and abstracts all the different settings which can be taken for dynamic frequency adjustments, like at which load to switch up frequency or how many steps the mechanism should traverse until reaching the maximum frequency. The higher the value, the more performance you get. Respectively, the higher the value, the sooner and the more often the frequency is switched up.

Method: SetCPUFreqConsiderNice (boolean)

Parameters: Whether or not niced processes should be considered on CPU load calculation.

Whether or not niced processes should be considered on CPU load calculation. If niced processes are considered, they can cause a frequency increment although their absolute load percentage wouldn't trigger the scaling mechanism to switch up the frequency. The default setting is 'false'.

Method: GetCPUFreqGovernor (void)

Get the current active governor for all CPU frequency interfaces (string).

Method: GetCPUFreqPerformance (void)

Get the current active performance setting if a dynamic scaling mechanism is in use (integer between 1 and 100).

Method: GetCPUFreqConsiderNice (void)

Returns whether niced processed are considered during CPU load calculation or not (returns boolean).

Method: GetCPUFreqAvailableGovernors (void)

Returns a list of strings of all available governors which could be set with the SetCPUFreqGovernor method.

Errors the above methods may raise on the interface org.freedesktop.Hal.Device.CPUFreq.

Error: GeneralError (void)

Detail field: The exact error.

A general error occured.

Error: UnknownGovernor (void)

Detail field: The governor which was tried be to set.

The governor which was tried to be set doesn't exist.

Error: PermissionDenied (void)

Detail field: The privilege the caller needs to execute the method.

The caller doesn't have the privilege to execute this method.

Error: NoSuitableGovernor (void)

Detail field: The method which was tried to be executed.

The method executed doesn't exist for the current active governor.

Error: GovernorInitFailed (void)

Detail field: The reason for the failure.

The initialization of the governor failed.

What's the CPUFreq addon all about

Addon-cpufreq supports kernel governors and also implements a userspace controlling mechanism. That makes it the "all you need for CPUFreq" application. Furthermore, to not make things unnecessarily complicated for desktop applications, it is supposed to abstract all the different settings you can make for the different governors. One thing which isn't available yet, is the possibility to control the performance of dynamically scaled CPUs. There's a method to set the performance (from 1 to 100) which can be easily used from other applications with showing a nice progress bar or the like. That makes it unique in contrast to former solutions.

The biggest advantage is that you get CPUFreq out of the box on every system supporting HAL without the need to install another daemon. Another one is that desktop applications like gnome-power-manager or kpowersave can use one common interface after all.

In general, there are three different policies one can choose from:

  1. Statically min frequency (corresponds to the powersave governor)
  2. Statically max frequency (corresponds to the performance governor)
  3. Dynamic frequency (ondemand or userspace governor)

In an ideal world, the the first two shouldn't be used at all. But at least the performance governor is still used because you get a little bit more performance in comparison with the dynamic policy. Particularly as soon as you have many small processes running. The default dynamic setting can be and was sufficient in most cases. But it's a too general setting IMO. So to basically get rid of the two static policies, I'm doing a split of the dynmaic policy into several parts.

The dynamic machanisms, either ondemand governor, conservative governor or the userspace implementation etc. consist each of several different settings and configuration options like at which load to switch up, how many steps the mechanism should traverse until reaching max freq and so on. The SetCPUFreqPerformance method which is available combines all these setting into one. The performance can be set between 1 and 100.

In general it sais, the higher the value the more often and the faster the frequency is switched up. Each governor has some sort of fixed CPU load limit (upthreshold) at which load to switch up frequencies. For instance, the ondemand governor has the upthreshold set to 80 by default. This corresponds to a performance value of 50. As an example, if you change the performance to 75, the upthreshold is set to about 45. So if the CPU load reaches a value higher than 45, the frequency is switched up. If the upthreshold would be 80 and a process would only need 60 percent of CPU, the frequency wouldn't be switched up. If the performance value would be 1 (lowest performance), the upthreshold would be 99. That means that the frequency is really only increased if you have a process which needs all CPU power (100%, for instance at compilation time).

I will do a small comparison… Two main situations where policies differ are

  • On AC power: You want to do dynamic frequency scaling, but It's not that important to reduce power consumption. You want to have the maximum performance as soon as you start a process but still want to reduce the power drain as soon as the system is idle. (e.g. because of thermal reasons). So you need a more aggressive dynamic CPU frequency scaling where you switch up sooner and more often. So for instance, you set the performance to 75.
  • On battery: The primary goal is to increase battery life. But you still want a job done as fast as possible as soon there is need to, for instance when compiling. So you might want a more tentative frequency adjustment. Performance maybe set to 25 in this case.

The reason why I chose such a fine grained range (from 1 to 100) is to get an interface that is not limited to only one implementation. For the CPUFreq case, one can image to have only five steps like min, low, medium, high, max. But that's something the GUI can and should decide. It can map these 100 steps to whatever it likes but I don't think that's the job of the addon. And maybe some system (embedded?, whatever) likes to have a completely other implementation with 20, 30, or even 100 steps but still likes to take advantage of the common interface, it than wouldn't be limited to a predefined range.

Additionally, if a policy decision maker (like a GUI) doesn't like to touch the performance setting, it doesn't need to. It gets a good default setting. The default is 50 and corresponds to a configuration which is either the kernel default or which is tested to be stable and useful for several years now in the userspace case.

FAQ

Q: What's the SetCPUFreqConsiderNice setting all about?

A: With the ondemand governor this setting is set through /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignoreniceload . The userspace implementation evaluates the CPU nice load when calculation the CPU load, too. It specifies whether niced processes can cause a CPU frequency increment although they don't really need that much CPU power and thus wouldn't jump over the UPTHRESHOLD where frequencies are usually increased. It's a policy decision. On AC power for example, you also want to consider niced processes because switching up CPU frequencies more often isn't that problematic. In the other case, On battery, you want a more passive frequency policy and thus you don't consider niced processes.

Q: Why not export a D-Bus interface for each of the processors in the system?

A: I thought about this possibility, too, but came to the conclusion that it doesn't make sense. There main reasons are:

  1. There are several dependencies between the different CPUs. The cpufreq interfaces below sys/devices/system/cpu lists all available CPUs, each in one directory. However, you cannot assume that you can simply control each CPU through it's single interface. There's a file affectedcpus in each directory listing the CPUs which are controlled through this interface. So maybe it could look like:

cpu0/cpufreq/ –> controls cpu0, cpu1 cpu1/cpufreq/ –> controls cpu0, cpu1 …

or: cpu0/cpufreq/ –> controls cpu0, cpu1, cpu2, cpu3 …

or: cpu0/cpufreq/ –> only controls cpu0 cpu1/cpufreq/ –> only controls cpu1 …

It doesn't make sence to export these dependencies. This would make things unnecessarily complicated for higher level applications.

  1. I can't imagine a use case were one might want to have different settings for different CPUs.
  2. Some of the cpufreq kernel developers even are agains the possibility to set different governors for the CPUs.
  3. There's a new powernow-k8 driver pending where you only can scale all CPUs at a time which would perfectly fit into the current desing.
  4. There's currently no such device object where to export it. There's only acpiCPU*. Except in very new implementations, in general, cpufreq has nothing to do with acpi, it also works with apm or even without acpi and apm.

Q: Isn't there a possibility to set the frequency manually?

A: No. Usually I'm one of those arguing for having a possibility to export all possible features to the user on the desktop ;-) But… This doesn't really fit into the concept of this addon. It would be a method that can only exclusively used if the userspace governor is set. And that contradics the concept to use common interfaces for all governors. And to be honest, we all hope that the userspace governor wouldn't be needed anymore someday. However, it would be possible. It would need another method SetCPUFreqFrequency and another property 'cpufreqavailablefrequencies'. But I really like to avoid that. This would also be something which we could add afterwards if it turns out be be really needed.

 


blog comments powered by Disqus