Every process in Linux goes through a variety of states throughout itβs lifetime. By coding these states, we can form the state machine of a process.
The KSE (Kernel Scheduling Entity) on a Linux system is the thread. Each thread has its own state, which is represented by a set of flags in the task_struct structure. Convention uses the word process. However, we will ignore convention going forward.
The states that a Linux thread can be in are:
TASK_RUNNING: The thread is either running on a CPU or is ready to run. In either state, the thread is put into the run queue. Linux maintains one run queue per core on the CPU.
TASK_INTERRUPTIBLE: The thread is sleeping and can be woken up by signals. Tasks that are waiting are put in a wait queue.
TASK_UNINTERRUPTIBLE: The thread is sleeping and cannot be woken up by signals.
TASK_STOPPED: The thread has been stopped, usually by a signal.
TASK_TRACED: The thread is being traced by another process (e.g., a debugger).
EXIT_ZOMBIE: The thread has finished execution but still has an entry in the process table.
EXIT_DEAD: The thread has been completely removed from the process table.
Once a thread has been created via fork(), clone(), or the pthread_create() API, it informs the process scheduler that it is now ready to be scheduled for execution.
The CFS is the default scheduling algorithm used in Linux for SCHED_OTHER threads.
It aims to allocate CPU time fairly among all runnable threads, ensuring that each thread gets a proportional share of the CPU based on its weight (priority).
CFS uses a red-black tree data structure to manage runnable threads, allowing for efficient insertion, deletion, and selection of the next thread. When a thread becomes runnable, it is added to the red-black tree based on its vruntime. The red-black tree is sorted in such a way that the task that has the smallest vruntime is always at the leftmost node of the tree, making it easy to select the next thread to run. Scanning the tree from left to right gives a timeline of execution for all runnable tasks.
Each thread has a virtual runtime (vruntime) that tracks the amount of CPU time it has received. The scheduler selects the thread with the smallest vruntime to run next.
CFS dynamically adjusts the vruntime of threads based on their priority, ensuring that higher-priority threads receive more CPU time.