A comparison of I/O Schedulers

What do you think of when you read the term “scheduler”? If you think of the mechanism that schedules the order in which processes are served, then you already have an idea of what this article is about. The term “scheduler” itself is broad, and in this article we will narrow it down to I/O, specifically disk I/O.

The first question to consider is why you need an I/O scheduler. To answer that, let’s briefly see how a disk (ATA/SATA HDD or “hard disk drive”) works. A typical HDD provides two ways of accessing a specific location:

* C/H/S (Cylinder/Head/Sector)
* LBA (Logical Block Address)

Most operating systems use LBA (especially Linux), which is based on sectors only, so the OS doesn’t need to take into account cylinders and
heads (each sector has a size of 512 bytes). When the OS asks a sector to be read, the disk head moves to the target sector, stops, fetches
the content, puts it in disk’s buffer, and generates an interrupt. This process is the same for all disk access whether it’s for reading or
writing. Note that the head moves mechanically, unlike the RAM/Memory, which has hardwired addresses and is directly accessible by the
processor. So accessing a sector on disk can be hundreds or thousands of times slower than RAM access, and the seeking latency for a specific
sector can vary greatly.

Also, the mechanical nature of disk head movement causes another issue:
More time spent moving the head means longer actual access time. It’s preferable if all the data can be read with just one sweep, but of
course that’s an ideal case.

So, given these large variations in seeking time, what can be done to decrease read/write latency? This is the question the I/O scheduler
tries to solve. You can imagine it as a traffic controller. In a nutshell, the scheduler can postpone a request, re-order requests, and
merge adjacent requests into one. The goal of an I/O scheduler is to optimize head movement so more things can be done during a given time
(resulting in technically higher throughput). With this idea in mind, let’s look at I/O schedulers in more detail.

Overview of I/O Schedulers

There are four built-in I/O schedulers ready to use in Linux; they are:

* Noop
* Anticipatory
* Deadline
* CFQ (Complete Fair Queueing)

Except for Noop, all of the above schedulers use the “queueing” concept as a way to manage incoming I/O requests. Queue here doesn’t
necessarily mean that requests will be processed in FIFO (first-in-first-out) order; the queue is just a container that holds
the requests to be manipulated further. A scheduler maintains one or more queues according to its implementation. In general, we see two
queues: The I/O scheduler’s queue and the driver’s queue.

Conceptually, the scheduler’s queue sits between the VFS (Virtual File System) layer and the block device driver layer. Using this approach,
the VFS layer needs no major modification to utilize the schedulers. Because the VFS layer already provides the basic elevator algorithm
(disk access algo), the only work needed was to modify that elevator support to provide a generic interface for I/O schedulers. So, requests
are submitted into a predefined I/O queue where the schedulers manage them. The same thing happens when the scheduler submits a request to
the block device driver’s queue. Thanks to this “object-oriented-like” approach, tweaking or even creating a new scheduler doesn’t necessarily
affect how other layers work.

With the latest kernels, you can also change the effective scheduler on the fly at runtime for each available block device. This means that if
you have two disks, they can use different I/O schedulers, so you don’t need to select just one when booting the kernel for all the disks in
the system. Using different schedulers on different block devices is a neat way to adapt to different I/O characteristics.

All I/O schedulers have some tunable parameters that allow users to tweak them according to their needs. And, each parameter can do a lot
to change performance.  Let’s take a brief look at each of these I/O schedulers.

Noop Noop is the simplest scheduler present in Linux. It is mostly suitable for truly random-access block devices, such as flash memory
cards. The incoming requests for read/write are kept in FIFO order, and only the current request added at the end of the queue is tested for
the possibility of merging. The downside is clear; you get no optimization at all, especially in situations where there are many
competing read/write processes. Per process sequential readings will be seen as “random” access. So, it’s better not to use this scheduler with
any sequential access device, such as a hard disk.

Anticipatory The Anticipatory scheduler is designed around the fact that a user tends to access adjacent sectors instead of jumping to other disk areas
during certain periods of time. Most file systems in Linux use a specific approach to ensure that sectors belonging to a file are kept
adjacent as much as possible. Between these reads, it is very likely that the task sleeps for a while before continuing reading.

So, the basic idea behind the Anticipatory scheduler is that it anticipates subsequent reads between sleeps. By staying a bit longer in
the same head position, it can reduce the back-and-forth seek movement to a certain degree. How long it should stay is sometimes determined by
looking into the next incoming read request, or in some cases by forecasting based on statistics. It is clear then that the Anticipatory
scheduler isn’t really great for time-sensitive read/write processes. The CFQ scheduler shares the same cons, although it is a bit better
because of its “fairness” ability, which we’ll describe later.

Deadline If you have a lot of requests, you may want to make sure a request is serviced within a strict period of time. This situation is quite
common for a busy file server, but you can see it on desktop workload, too (e.g., graphics rendering). The Deadline scheduler was created to
address this need. Read and write processes are guaranteed to be serviced within a strict amount of time, and read is prioritized over
write. So, the Deadline scheduler pays attention to latency issues but not necessarily throughput.

Read is favored over write, because read is assumed to be more important; however, write operations aren’t allowed to be starved too
much. This behavior can also be tweaked with some tunables, which we’ll talk more about later.

The Deadline scheduler is a sound fit for high-performance I/O because it decreases the interactivity to a certain degree. For someone who
demands smooth desktop experience, however, this might not be the scheduler to choose.

CFQ CFQ is the default scheduler used in the latest kernel versions, and it is typically suited for people seeking balanced I/O service. Stressing
“fairness”, it tries to manage read/write processes using priorities so that nothing dominates or fully dictates the way the kernel serves
these requests. This is done by dividing the bandwidth of each block device fairly among the processes performing I/O to that device. Lately
it has also been improved by the time slice-based operation, meaning it does what process schedulers do: Serves a request within a specific
duration, stops, switches to another one, and repeats.

When a process submits a request for disk access, the request can be classified as synchronous or asynchronous. In the case of synchronous
requests, the process waits for the request’s completion; for asynchronous requests, it simply sends the request to disk and starts
doing some other work. Hence, the CFQ scheduler gives more time/priority to synchronous requests rather than to asynchronous
requests. CFQ gives the best balance (theoretically) between throughput and interactivity.

For more infomation please visit the Original post here at www.samag.com

Keep note: This article is a completely forwarded version. For issues please contact me at joseph#admonORG

Share this post

Post Comment