Scheduling Policy in Linux Kernel v2.6


In this article, I have talked about some general things about a task scheduler and then given details about linux kernel scheduler for v2.4 and v2.6.

1. General Description of scheduler

Task Scheduler is one of the most important features of any Operating system. Apart from memory management and handling of hardware, task scheduling forms the core of any OS. Scheduler in bare most form, arranges all the active tasks in the system, in form of some list, based upon some metric which is usually an integer value called as priority value of the process; thereafter it chooses the task with the highest priority and allocates CPU and other relevant resources to it. The Process runs till its time quota expires or goes to some waiting state after calling for some i/o or signal or the process terminates or gets completed. In each of these cases, this process is removed from active list and and in case its not terminated or completed, it is transferred to some other (waiting) queue. In this process, the priority of the task is modified accordingly and thus, its position in the list changes. The question of when and how this is done, depends upon design philosophy of the OS.

The main areas of complexity in any task scheduling algorithm are listed below

1 Arranging tasks as per their priority.
2 Recalculating their priority
3 Picking task with highest priority.

Time complexity of each of the step above depends upon how we are storing the tasks and how we do the work. Before designing any algorithm, one should be clear about the objectives because designing any software is a trade off process wherein, in order to achieve something, one has to sacrifice some other thing. The basic pivotal question about writing a scheduler should be �What are tasks which are to be scheduled?� rather than "what should be done to write a good scheduler." he scheduler for any Real time Embedded systems would ideally lay more emphasis on minimizing response time to critical tasks (making system look more and more deterministic) whereas a scheduler meant for desktop systems would concentrate on minimizing interruptions and maximizing CPU utilization. In general, the main aim is not getting one thing right, but achieving all things at the same time.

2. Scheduler of Linux Kernel v2.4

The kernelv2.4 scheduler is basically a time slice highest priority non-priority scheduler. All tasks are arranged in form a of a double link list (called active queue) as per their priority. The scheduler picks the task with highest priority, executes it and then puts it in another queue (another link list) called expired queue. In due time, the active list gets empty and expired list is full. Then the scheduler reassigns new priorities to each task and then the active queue is rebuild. Thus essentially, the scheduling algorithm is having a time complexity of Order n or O(n) where n is the number of tasks. The linear factor comes from the fact that Since link list offers linear access so one needs to travel sequenctially till the last to process the last element.

For both SMP and uniprocessor system, there is a single global active queue. So every processor that needs to access it must first fetch a spinlock called GLOBAL BIG KERNEL LOCK and then modify the task list and thereafter release the lock.

Apart from that, for SMP based systems, the scheduler doesn�t harness the spatial parallelism being offered by the system. Here, there is a very loose binding between the tasks and the processor. Suppose a task T is executing on processor P1. All relevant calculations and context would be P1's local cache. Now T goes for some I/0 activity and gets scheduled out. Next time, there is no guarantee that T would be scheduled to P1. If allocated a new processor, it has to reload the cache of new processor and this will further add delay as well as redundancy in the system. Particularly in SMP system, idle processors start executing waiting processes whose time slices haven't yet been exhausted, causing processes to start "bouncing" between processors. This affects the overall system performance. All these factors make the system less responsive. Further, the 2.4kernel is non-preemptive. If a task makes a system call and is running code in kernel mode, it cannot be interrupted until that kernel code is completed. If the running task in question is the low-priority task, the nonprime-emptive nature of the pre-2.6 kernel effectively gives this task priority, thus, impeding the responsiveness of the higher-priority task.

All these factors are major hurdles to make a highly responive and deterministic scheduler. These factored are remedied in scheduler design for kernel v2.6

3. Scheduler of Linux Kernel v2.6

The three performance constraints described above, namely, the linear time complexity of the scheduling algorithm, lock contention for the single run queue and the lack of CPU affinity represent significant barriers to scalability in the scheduler of kernel older than 2.6.

The major change in v2.6 is that the kernel is made pre-emptible i.e., a higher priority task can now interrupt the lower-priority task even during the processing of a system call. The high-priority task can, therefore, get control of the CPU faster than was previously possible and deliver more responsive service for time-critical tasks. Of course there are still regions of kernel code wherein the task is not pre-emptible.

Further, there is a significant change in how tasks context are stored. The scheduler follows a new scheduling algorithm popularly called O(1) algorithm .The new scheduler instead of using a single global run queue, deploys one queue per processor. This change remedies the last two issues i.e., multiprocessor scalability and CPU affinity. Having run queues available on a per-processor basis eliminates the need to serialize one global run queue and thus the scheduler can concurrently be accessed by multiple processors, each on their own local queue.

CPU affinity is naturally improved in this scheme. Because run queues are local to each processor, an interrupted task will resume on the same processor it was running on before the interruption, and any performance improvement realized by local processor and memory caches is maintained. However maintaining a total CPU affinity might not be a good idea as it would go against the goal of distributing work equally among processors. Further it might lead to more subtle problems of cache coherency.

Further another remarkable change has been done in the way tasks contexts are stored. Even despite using multiple queues (doubly link lists), queues at each processor still grow linearly with the number of tasks in the system and thus to process the last task, it still needs to scan all n entries, and the O(n) behavior would still occur. To remedy this, the new scheduler implements a priority-based array of task entries (implemented via Heap) that enables the highest-priority task to be found quickly. It also eliminates the recalculation of priorities and time slices by maintaining two queues: an active queue and an expired queue.

In this scheme, the scheduler recalculates the time slice of an expired task before it places it on the expired queue. When all the tasks expire, the scheduler simply needs to swap the active and expired queue pointers and schedule the next task. Long scans of run queues are, thus, eliminated. This process takes the same amount of processing, irrespective of the number of tasks in the system. It no longer depends on the value of n, but is a fixed constant. In the same notation, this constant behavior is designated as O(1), and the new scheduler is often referred to as the �O(1) scheduler.�

4. Conclusions

Any algorithm can be effectively optimized if we can clearly identify what objectives we need to achieve. Further how data is stored, also matters. The scheduler for kernel v2.6 performs much better than for scheduler of v2.4 for obvious reasons of preemptibility, scalability, better CPU affinity as well as less processing. However, there is lot to be done in this area for SMP systems as well as in areas of hard real time operating systems.  


I am Nikhil Bhargava from Delhi, India. I am a Computer Engineer with research intrests in mobile networks and operating systems.Comments and Suggestions are always welcome.

Hosted by www.Geocities.ws

1