Using Ga Based Decision

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

ABSTRACT

Genetic algorithm is used in various ways for intrusion detection. In this paper we propose to use genetic algorithm for evolving the decision tree for network intrusion detection. We used GATree for the classification of the data. We used KDD 1999 data for training as well as for testing purpose. We have compared the results with machine learning algorithm J.48 & simple CART. The experimental results show that the GATree results are comparable with that of J.48 & simple CART results but at the cost of larger time consumption.

Keywords: Genetic Algorithms, Decision tree, Intrusion detection.

1. INTRODUCTION

When a computer system is connected to a network it goes on a high risk. There are various threats to a computer system such as viruses, Trojan horses, worms, intrusions etc. Viruses can be greatly controlled by installing antivirus software and updating it regularly.

Any unauthorized access causing violation to the security policy of a system is called intrusion to a computer. Intrusions cannot be predicted. Hence more focus is put on intrusion detection. The sooner we are able to detect an attack, the quicker we can act. Intrusion detection can help us to collect more information about attacks, strengthening the intrusion prevention methods. Various soft computing techniques such as Genetic Algorithm, Artificial Neural Network and Fuzzy Logic are used to make an intrusion detection system (IDS) smart enough to detect the intrusions at the earliest so that future damage can be avoided.

There are number of limitations to the prevention based approach for computer network security [1]. First, it is probably impossible to build a completely secure system. Further, the prevention based security philosophy constrains the user’s activity and productivity. Hence intrusion detection systems are designed based on various detection techniques, namely Anomaly intrusion detection and Misuse intrusion detection [2].

Anomaly intrusion detection:

In anomaly IDS the user’s behaviour is compared with a known standard behaviour to detects any significant deviation from normal behaviour. This approach can be more effective in protection against unknown or novel attacks since no prior knowledge about specific intrusions is required. However it may cause more false positives because abnormality can be due to a new normal behaviour [3].

Misuse intrusion detection:

This is the most widely used IDS. It uses patterns of known attacks or weak spot of the system to identify known intrusions. The signatures and patterns used to identify attacks consist of various options in the packet like source address, destination address, source and destination ports and even the key words in the content area of a packet.

An IDS can also be classified in to two categories based on their location [4], as host based and network based IDS. A host based IDS monitors activities associated with a particular host; whereas a network based IDS monitors activities associated with network.

GA is found to be the most efficient technique for intrusion detection in terms of detection accuracy at the expense of time [5]. In this paper we propose to use GA based Decision tree for intrusion detection. Decision tree is an example of a classification algorithm. The main advantage of decision trees over other classification algorithms is that they produce a set of rules that are transparent ,easy to understand and easily incorporated into real-time technologies like IDS and firewalls [6].

2. CLASSIFICATION ALGORITHMS

It is a data mining technique used to map data instances into one of the various predefined categories. It can be used to detect individual attacks but it has high rate of false alarm. Various algorithms like decision trees, Bayesian networks, k-nearest neighbour classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques are used for classification techniques [7]. The classification algorithm has been then applied to audit data collected which then learns to classify new audit data as normal or attack.

2.1. Decision Tree

A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision. Each decision tree represents a rule which categorizes data according to these attributes. A decision tree consists of nodes, leaves, and edges. A node of a decision tree specifies an attribute by which the data is to be partitioned. Each node has a number of edges which are labelled according to a possible value of the attribute in the parent node. An edge connects either two nodes or a node and a leaf. Leaves are labelled with a decision value for categorization of the data.

One of the greatest advantages of decision tree classification algorithm is that: It does not require users to know a lot of background knowledge in the learning process [8].

There are various decision tree algorithms such as GATree, ID3, C4.5, SLIQ, CHAID and CART [8].

2.1.1 GATree

In this paper, we have used genetic algorithm based decision tree i.e. GATree to classify network intrusions. GA tree has been used because of the following advantages offered by it [9].

(a) GATree can continue decision tree evolution for as long as needed.

(b) GATree allows the user to select the characteristics of the resulting decision tree. It’s easy to prefer smaller or more accurate trees.

(c) GATree can provide a set of totally different decision trees that are close matches to the solution space. All those trees can be used alternatively to the best-fit one.

(d) There are certain domains where statistical inducers cannot produce optimal trees. GATree can overcome global or local minimums.

In this work the classification rules are generated using GA approach. These rules are then used to classify or detect the infected connections.

3. DATA SET

KDDCUP 99[10] data set used to train and test the system classifier. The dataset has been provided by MIT Lincoln Labs. It contains a wide variety of intrusions simulated in a military network environment set up to acquire nine weeks of raw TCP/IP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. The LAN was operated as if it were a true Air Force environment, peppered with multiple attacks. Hence, this is a high confidence and high quality data set. They set up an environment to collect TCP/IP dump rows from a host located on a simulated military network. Each TCP/IP connection is described by 41 discrete and continuous features (e.g. duration, protocol type, flag, etc.), and labelled as either normal, or as an attack, with exactly one specific attack type (e.g. Smurf, Perl, etc.). Attacks fall into four main categories:

Denial of Service Attacks (DOS) in which an attacker overwhelms the victim host

User to Root Attacks (U2R) in which an attacker or a hacker tries to get the access rights from a normal host in order, for instance, to gain the root access to the system.

Remote to Local Attacks (R2L) in which the intruder tries to exploit the system vulnerabilities in order to control the remote machine through the network as a local user.

Probing in which an attacker

attempts to gather useful information about machines and services available on the network in order to look for exploits.

4. IMPLEMENTATION AND RESU-LTS

We have implemented three decision trees such as GATree, J48 and simple CART.

For GATree implementations, we used GATree software provided by GATree.com. J48 and simple CART algorithms were implemented in the open source software called Weka.

Following setting were made for GATree implementation.

Basic Settings:

Generations: 500, Population: 100

Advanced settings:

Crossover probability: 0.99, Crossover Heuristic: Standard random crossover, Mutation probability: 0.01, Mutation Heuristic: Standard Random Heuristic, Random seed initializer: 123456789, Percent of genome replacement: 0.25, Interface updater: 500, Error rate: 0.95

We used tenfold cross validation technique for all the algorithms.

Table1.A comparison

Sr.No

Data

(No of instances)

Parameter

Algorithm

GATree

J48

1

24701

Classification accuracy

98.13%

99.79 %

Tree size

12

91

2

49402

Classification accuracy

98.13%

99.85%

Tree size

10

245

3

123505

Classification accuracy

99.92%

Tree size

283

4

247010

Classification accuracy

99.94%

Tree size

467

5

370515

Classification accuracy

99.95%

Tree size

736

6

494020

Classification accuracy

99.96%

Tree size

833

5. CONCLUSION

The classification accuracy obtained by using GATree is comparable with J48 and simple CART algorithms. But the time consumed by GA tree is very large.

GATree provides good accuracy.

GATrees are very short and are often accurate than J48 and simple CART. J48 produces good accurate results but with unnecessarily big trees.

GATREE has been generating close-match estimates even with the quantized formats. This is an indication that GATREE can produce quality results even in the presence of noise.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now