OSTEP Chapters 6,7
How does your computer create the illusion of running dozens of applications simultaneously when it only has a few physical cores?
Wait, I forgot the question because I am now checking my email. Ok, back to it...
The answer is CPU Virtualization. Chapters 6, 7 of OSTEP explore the engine behind this illusion, and how to balance raw performance with absolute control.
The OSTEP textbook is freely available at Remzi's website if you like to follow along.
Chapter 6. The Mechanism: Limited Direct Execution
The crux of the challenge is: How do we run programs efficiently without letting them takeover the machine?
The solution is Limited Direct Execution (LDE) --the title spoils it. "Direct Execution" means the program runs natively on the CPU for maximum speed. "Limited" means the OS retains authority to stop the process and prevent restricted access. This requires some hardware support.
To prevent chaos, hardware provides two execution modes. Applications run in "User Mode", where they cannot perform privileged actions like I/O. The OS runs in "Kernel Mode" with full access to the machine. When a user program needs a privileged action, it initiates a "System Call". This triggers a 'trap' instruction that jumps into the kernel and raises the privilege level. To ensure security, the OS programs a "trap table" at boot time, telling the hardware exactly which code to run for each event.
If a process enters an infinite loop, how does the OS get the CPU back?
- The Cooperative Approach: Older systems (like early Mac OS) trusted processes to yield the CPU periodically. If a program locked up, you had to reboot.
- The Non-Cooperative Approach: Modern systems use a "Timer Interrupt". The hardware raises an interrupt every few milliseconds, forcefully halting the process and handing control back to the OS.
Finally, when the OS regains control and decides to switch to a different process, it executes a "context switch". This low-level assembly routine saves the current process's registers to its kernel stack and restores the next process's registers. By switching the stack pointer, the OS tricks the hardware: the 'return-from-trap' instruction returns into the new process instead of the old one.
Chapter 7. The Policy: CPU Scheduling
With the switching mechanism in place as discussed in Chapter 6, our next piece to attack is deciding which process to run next. Chapter 7 explores these policies, initially assuming all jobs arrive at once and have known run-times.
First, let's look at batch scheduling using "Turnaround Time". This metric is simply the time a job completes minus the time it arrived (T_completion - T_arrival). Now let's consider some batch scheduling policies with this metric:
- FIFO (First In, First Out): Simple, but suffers from the "convoy effect". If a short job gets stuck behind a long one, average turnaround time suffers.
- SJF (Shortest Job First): To fix the convoy, SJF runs the shortest job first. This is optimal for turnaround time if all jobs arrive at once, but fails if a short job arrives after a long job has started.
- STCF (Shortest Time-to-Completion First): By adding preemption, we get STCF. When a new job arrives, the scheduler runs the job with the least time remaining. This guarantees optimal turnaround time, see Fig 7.5.
Now, we consider interactivity using "Response Time". This is the time from when a job arrives to the first time it is scheduled (T_response = T_firstrun - T_arrival).
STCF is great for turnaround but terrible for response time; a user might wait seconds for their interactive job (say a terminal session) to start. Round Robin solves this by time-slicing: it runs a job for a set quantum (e.g., 10 ms) and then switches. This makes the system feel responsive.
However, Round Robin creates a trade-off. While it optimizes fairness and response time, it destroys turnaround time by stretching out the completion of every job. You cannot have your cake and eat it too. See Fig 7.6 & 7.7.
Finally, real programs perform I/O. When a process blocks waiting for a disk, the scheduler treats the time before the I/O as a "sub-job". By running another process during this wait, the OS maximizes "overlap" and system utilization.
There is one last assumption we did not relax: the OS does not actually know how long a job will run. This "No Oracle" problem sets the stage for the next chapter on the "Multi-Level Feedback Queue", which predicts the future by observing the past.
To conclude Section 7, it is worth remembering that there is no silver bullet. The best policy depends entirely on the workload. The more you know about what you are running, the better you can schedule it.