Disk Scheduling Applying Different Self Learning Algorithms

Published Date: 02 Nov 2017

Though there are high speed microprocessors developed

very fast, disk I/O is still major performance issue. It

is due to the fact that no faster seeks and rotations

of disks are achieved[3]. Shortest seek first(SSF) technique

selects the I/O request for the service in either

decreasing or increasing order of cylinders of disk. I/O

request is served based on shortest time considering

both seek time and rotational time[8] in shortest time

first(STF) technique. Disk scheduler plays major role

in I/O operation service. The task of disk scheduler is

to fetch I/O requests from file systems and forwarded

to physical storage. Keeping in mind the design change

and implementation modification for I/O schedulers,

performance of I/O system can be improved. It is very

difficult to identify a single scheduler which is optimal

in every type situation. Performance of disk scheduler

varies depending on different factors such as type of

storage system, type of I/O requests, type of processor

architecture, and so on.

In this regard to achieve best performance, automation

of workload classification, selection of scheduling policy,

storage system identification is required[1]. This paper

attempts to improve disk I/O performance by adjusting

to best scheduler automatically for I/O operations which

are under consideration. We propose here self-learning

disk scheduling algorithms that learn the type of workload,

switch amongst themselves for specific workload

type, selects optimal scheduling policy, in short improves

I/O system performance. System uses the workload

generated by standard tool. This workload serves as

input I/O requests.

2. Related Work

Classic I/O schedulers[5] in Linux 2.6 are described below:

Anticipatory [2]: Main functionality of operating system

is disk scheduling from start of use of computer.

As processes issues requests they are scheduled for disk

I/O operation. Process issuing current request has more

probability to make next request. Work conserving disk

schedulers serves a I/O request and at the same time it is

ready with next request to serve. Due to this scheduler

makes early decision, assuming that there is no more

request from the process which has issued the current

request, and selects a request from some other process.

It thus suffers from a condition we call deceptive idleness[

4], and becomes incapable of consecutively servicing

more than one request from any process. Anticipatory

scheduling provides solution for deceptive idleness. It

work as explained here. Scheduler wait for short period

of time so that if next request is from same process. It

takes less time as compared to immediately switching to

new request from other process. The benefits are more

if more requests served are from same process. Context

switch is minimized. It is common and advantageous for

data requested by a process to be positioned in sequence

one after the other on disk. Deceptive idleness guides a

scheduler which is optimized for seek to select requests

from different processes one at a time.

Next scheduler in Linux 2.6 is Deadline Scheduler[7]:

Two types of lists are maintained by deadline scheduler

one is sort lists and other is fifo lists. There are two sort

lists, one contains read requests and other contains write

requests. These lists are sorted based on logical block

numbers of their data, so are called sort lists. Purpose of

remaining two fifo lists is to maintain read request and

write requests ordered on their deadline. when request

arrives it is assigned a expiration time, that is called as

deadline. Request is served before it deadline ie expiration

time. Generally read requests are served much

earlier than write request because the expiration time of

read requests is 10 times lesser than the expiration time

of write requests. As processes having read requests are

served quickly, this scheduler not suitable for equal distribution

of I/O resources among processes waiting for

I/O operations. Also expiration time assigned for I/O

request is not always followed. In some cases other factors

like priority of I/O request, their location in queue

may not allow to meet the deadline.

Completely Fair Queuing (CFQ)Scheduler is a scheduler

that assigns I/O resources fairly among all waiting processes.

It is achieved by maintaining a queue for every

process category making I/O requests. A process categories

are decided based on id of group of process, thread

id, id of user, or id of a group. Processâ€™s category id is

used to insert request into a queue. This is done at operation

of enqueuing. While dequeue operation involves

selecting, sorting and keeping request on dispatch list.

After this request is sent to the disk controller. Tunable

parameter quantum, controls the number of requests

fetched from each category of process. All processâ€™s categories

share the available I/O bandwidth equally. This

scheduler is used mainly in database applications that

do not require real-time response. It also provides better

I/O system utilization than does the deadline scheduler.

The I/O scheduler works as communicator between block

I/O system and device driver in Linux. The file system

and memory management module uses the functions provided

by I/O block to send requests. Request transformation

is carried out by the disk I/O scheduler and then

these requests are provided to the device drivers which

are at low-lever in architecture.

3. Programmerâ€™s design

Problem Statement To design a system applying

disk scheduling algorithms those train themselves using

machine learning technique. Depending on this analysis,

select optimal scheduler based on types of I/O disk

requests.

Existing System:

Linux 2.6 provides classic schedulers as anticipatory[4],

Deadline, Noop[5], and Complete Fair Queue. It is

possible to switch between default scheduler and others

depending on workload change. There are machine

learning techniques to improve I/O storage system but

not the disk operation schedulers.

Proposed System:

It consists of self learning schedulers that learns, models,

and classifies the workload. It also selects best scheduler

depending on performance.

3.1. Mathematical Model (Font 12 italic)

Two-dimensional array Pdisk (t,r) is used to indicate the

performance for disk I/O between time duration from

t1 to t2, where t denotes throughput, and r denotes response

time. Mathematical model for the disk performance

of system can be represented as

Pdisk (t,r) = t1

t2 S(f,w,c,d,p,m,i)

The symbol used in above equation and their meaning is

given in following table. Two proposed algorithms:

Table 1: Symbol Table

Symbol Meaning

T Throughput

R Response time

S I/O system

F File system

W Workload

C CPU

D Disk

P Tunable parameter

M Miscellaneous factor

I Disk scheduler

UP User preference

1. Change-Sensing Round-Robin Algorithm

This approach uses classic schedulers that exist in operating

system. All schedulers are executed one after the

other sequentially for fixed amount of time frame. Performance

data for each scheduler is logged separately. This

performance data is analysed to select best scheduler for

the current workload. Selected scheduler is set for rest

of the workload. Following criteria used for selecting the

another scheduler.

â€¢ When there is change in type of workload is detected.

â€¢ When system performance drops below some threshold

value.

loop starts

for each scheduler i out of n existing schedulers in

operating system

execute(ith scheduler) and

log(performance data)

next scheduler = Fun of max(ith scheduler, preference);

if (next scheduler != current scheduler) then

current scheduler = next scheduler

load (current scheduler)

while(!(bad performance or workload change))

wait tsecond

2: Feedback Learning

Training Phase

In this phase training data in form of I/O operations

is produced in more quantity. This data is used by

self-learning module. All schedulers are assigned same

kind of I/O request. Throughput and response time

for each request is stored to the database. Correct

classification model of workload is prepared based on

the analysis of performance data. loop starts

for each scheduler i out of n existing schedulers in

operating system

train(ith scheduler) using disk I/O operations and

train(ith scheduler) using I/O operations generated by

standard tool like IOMeter and

log(performance data)

Generate model for current workload using Self-Learning

algorithm

Decision/Feedback Phase Incoming I/O requests are

categorised into different workloads using constructed

module. This classification takes place at runtime.

Now these classified workload is assigned to the best

disk I/O scheduling policy available in the learned

database information, and selects the disk scheduler

which is accurate and optimal for assigned workload.

Performance data such as workload type, response time,

throughput, used scheduler, and parameters are stored

into the database and used to train the system. The

throughput and response time measured for the disk I/O

scheduling policy are sent to the self-learning core for

online learning. We use the feedback phase to improve

the correctness and completeness of the classification

model.

initialize all request

loop starts

while size of collected requests Â¡= X

collect incoming request

next scheduler = scheduler returned by the model for

specific workload

if (next scheduler != current scheduler) then

current scheduler = next scheduler

log(performance data)

append collected requests to total request

if(Size of total requests) mod Y == 0)

generate model for current workload using Self-Learning

algorithm

clear collected requests

load (current scheduler)

Different Machine Learning Techniques

Neural Network

Neural network is the system that trains itself using different

approaches. Training can be online or offline ie.

Input data for training is taken from online content or

provided offline. As data moves forward and backward

in network from node to node, system adjusts itself. System

adjusts weights assigned to nodes itself.

Support Vector Machine

Support vector machine uses supervised learning technique.

It identifies and categories patterns based on

data analysis. In Support vector machine(SVM), training

data is categorised into two classes. Two classes are

prepared from training data separating input in clearly

two categories. When new data is provided to SVM,it is

categorised into either of two categories.

Naive Bayes

Bayesâ€™s theorem is used in this approach[9]. Naive Bayes

classifier is classifier based on probabilistic model. It is

simple from construction point of view because it uses

limited training. A Naive Bayes classifier makes a assumption

that if particular class feature is provided, then

itâ€™s presence is related to presence of any other feature

or not is stated.

C4.5 Decision Tree Algorithm

C4.5 decision tree algorithm uses ID3 algorithm. For

classification purpose, decision tree is used generated by

C4.5. C4.5 removes unnecessary branches and reduce the

size of tree.

3.2. Data independence and Data Flow architecture

Data flow diagram at first level ie. level 0 represented by

single circle. This single circle indicates whole system.

At second level ie. level 1 DFD, it contains main modules

of the system. Modules in system are self-learning

module that trains itself and selects best scheduler for

current workload type, decision module that guides in

selecting scheduler and log database for purpose of keeping

record of performance data of all processes. External

input is shown by rectangle and processes are shown by

ellipse or circle. We can stop at a level where each process

might represent small part in the implementation ie. if

you can map ellipse or circle to programmable function.

Figure 1: Level 0 DFD

Figure 2: Level 1 DFD

Figure 3: Level 2 DFD

3.3. Turing Machine

Following state diagram shows different states reached

by system during execution. IOMeter generates a I/O

requests. These I/O requests are collected and depending

their features they are categorised accordingly. We

first apply all existing classic schedulers to satisfy these

requests. These schedulers are executed one after the

other for specific interval. Seeing the performance data

for some chunk of requests, best scheduler is selected

for rest of the workload. While in second approach, in

training phase we first train system using disk I/O operations

and synthetic I/O requests. Performance data in

form of response time and throughput is stored into the

database. In self-learning module, it generates a model

that gives best scheduler for current type of workload.

Feedback phase in this algorithm is used to achieve more

correct classification.

Figure 4: State Diagram

4. Conclusion

The self-learning disk scheduling schemes can improve

the performance of disk-scheduling. It trains themselves

automatically, adapts to various types of workloads, and

make optimal scheduling decisions.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Disk Scheduling Applying Different Self Learning Algorithms

2. Related Work

3. Programmerâ€™s design

Existing System:

Proposed System:

Symbol Meaning

1. Change-Sensing Round-Robin Algorithm

2: Feedback Learning

Training Phase

Different Machine Learning Techniques

Neural Network

Support Vector Machine

Naive Bayes

C4.5 Decision Tree Algorithm

4. Conclusion

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time