A Survey Of Applying Data Mining

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract

The e-Learning provides the Learners to interact each other as well as with teachers. The new technology virtual learning system (VLE) is used to interact between the student and teacher. The vast improvement in the field of information technique leads to share of information resources. The development in e-Learning system which arise a new way of distributing knowledge around the world without any restrictions. The e-Learning system has huge volume of recourse sharing at various places. The e-Learning retrieval system contains various information’s that can be shared and communicate between the Learners in a course and store Resources from the various server can be retrieved. So there is a need to measure the sharing of resource using specific technique. The data mining clustering technique is used to retrieve the recourse of the e-learning system and measure the recourse shared.

The first application of clustering methods in e-learning, a network-based testing and diagnostic system was implemented. It entails a multiple-criteria test-sheet-generating problem and a dynamic programming approach to generate test sheets. The proposed approach employs fuzzy logic theory to determine the difficulty levels of test items according to the learning status and personal features of each student, and then applies an Artificial Neural Network model: Fuzzy Adaptive Resonance Theory (Fuzzy ART) to cluster the test items into groups, as well as dynamic programming for test sheet construction. In , an in-depth study describing the usability of Artificial Neural Networks and, more specifically, of Kohonen’s Self-Organizing Maps SOM) for the evaluation of students in a tutorial supervisor (TS) system, as well as the ability of a fuzzy TS to adapt question difficulty in the evaluation process, was carried out. An investigation on how Data Mining techniques could be successfully incorporated to e-learning environments, and how this could improve the learning processes was presented in. Here, data clustering is suggested as a means to promote groupbased collaborative learning and to provide incremental student diagnosis. In user actions associated to students’Web usage were gathered and preprocessed as part of a Data Mining process. The Expectation Maximization (EM) algorithm was then used to group the users into clusters according to their behaviors. These results could be used by teachers to provide specialized advice to students belonging to each cluster. The simplifying assumption that students belonging to each cluster should share Web usage behavior makes personalization strategies more scalable. The system administrators could also benefit from this acquired knowledge by adjusting the e-learning environment they manage according to it.

The EM algorithm was also the method of choice where clustering was used to discover user behavior patterns in collaborative activities in e- learning applications. Some researchers , propose the use of clustering techniques to group similar course materials: An ontology-based tool, within a Web Semantics framework, was implemented in with the goal of helping e-learning users to find and organize distributed courseware resources.An element of this tool was the implementation of the Bisection K-Means algorithm, used for the grouping of similar learning materials. Kohonen’s well-known SOM algorithm was used in [14] to devise an intelligent searching tool to cluster similar learning material into classes, based on its semantic similarities. Clustering was proposed in to group similar learning documents based on their topics and similarities. A Document Index Graph (DIG) for document representation was introduced, and some classical clustering algorithms (Hierarchical Agglomerative Clustering, Single Pass Clustering and k-NN) were implemented. Different variants of the Generative Topographic Mapping (GTM) model, a probabilistic alternative to SOM, were used in for the clustering and visualization of multivariate data concerning the behavior of the students of a virtual course. More specifically, in a variant of GTM known to behave robustly in the presence of atypical data or outliers was used to successfully identify clusters of students with atypical learning behaviors. A different variant of GTM for feature relevance determination was used in to rank the available data features according to their relevance for the definition of student clusters.

E-Learning Systems

The e-Learning system consists of acquiring knowledge and recourse sharing. The e-Learning system can enable better allocation of resources and organize the learning processes in order to improve the learning experience of the student as well as increase their learning Knowledge . The e-Learning systems increase the sharing of recourses providing recourse material to learners. The recourse is being centralized using server is locate worldwide. The e-Learning recourse in much discipline the related recourse are grouped based on the course as per the leaner wish. The learner can select the course material as Text, PPT, video which is related to the course. The e-learning system provides fast easy and efficient of recourse material can retrieved by data mining clustering technique.

Theory of e-Learning

The e-Learning system accumulates a vast amount of information which is valuable for learner development. The learning management systems accumulate a great deal of student activities can record whatever student activities are involved such as reading writing talking test performing various task and even communicating with other. The e-Learning system facilities of communication between educators, sharing resources, producing content material, preparing assignment, enabling synchronous learning with forms chat, new services, of e-Learning system which improves the knowledge of the learner accessing open resources with waste of resources.

Applying Data Mining Technique

Data mining or knowledge discovery in database (KDD) is the automatic extraction of implicit and interacting. Data mining is a multidisciplinary area in which several computing paradigms converge decision tree construction, rule induction, K-means, SVM, Apriori, Page Rank, AdaBoost, KNN, Naïve Bayss, And CART algorithms etc And some of the most useful data mining task and method are static’s Visualization, clustering classification, association rule mining ,sequential pattern mining text mining, etc.

Educational Data Mining Due to the large quantities of data in e-learning systems, it is very difficult for educators to analyze them manually. Educational Data Mining EDM the area of scientific inquiry centered on the development of methods for making discoveries within the unique kinds of data. That come from educational settings, and using those methods to better understand students and settings which they learn in Educational Data Mining is an emerging discipline, with developing methods for exploring unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in as defined by The Educational Data Mining community. The Educational Data Mining providing research efforts in the field, review of the many applications of Data Mining to e-learning. The review of the history and current trends of EDM review of most relevant studies carried out in this field. Data mining techniques relation with the fields of Artificial Intelligence (AI) and Machine Learning (ML) have been highlighted in many researches are classified into four main areas 1) improving student models, that provide detailed information about a student’s characteristics; 2) discovering models of the knowledge structure of the domain 3) studying the pedagogical support provided by learning software 4) scientific discovery about learning and learners. The first three categories are universal across different fields of data mining are categories particularly related to educational data mining.

Statistics and visualization

Web mining

Text mining

Prediction

Clustering

Relationship mining

Distillation of data for human judgment

Discovery with models

Using data mining Technique

In E-learning, clustering has been used for finding clusters of students with similar learning characteristics and to promote group-based learning and to provide learner diagnosis. Rapid Miner system has several clustering algorithms available. The K-Means has been used here. Clustering techniques apply when the instances of data are to be divided into natural groups. In k-means algorithm clusters are specified in advance prior to application of the algorithm provides a good review of different data mining clustering techniques.

The data mining technique is used in e-learning system which can able to download large volume of resources. The clustering technique is one of the data mining technique is used in the e-learning system for probably and distribution.

Concept Of Clustering

The Clustering is defined as the division of data in groups of similar object. The clustering is used for searching the relevant data as the linear request when the resource is not found in the related server. The clustering also helps to gather to relevant resource from the various servers with fast speed.

Clustering Technique

The clustering technique is broadly classified into two partitioning, Hierarchical clustering it sub divided into agglomerative and divisive. The hierarchical algorithms build clusters gradually partitioning algorithm learn cluster directly. They either try to discover cluster by iteratively relocating points between subset, or try to identify cluster as areas highly populated with data. Partitioning relocation methods are further categorized into probabilistic clustering K-medoids methods. Such methods concentrate on how well points fit into their clusters and tend to build cluster of proper convex shapes.

Partitioning algorithm of the second type is surveyed in the section density-based partitioning. They try to discover dense connected components of the data, which are flexible in term of shape .density –based connectivity is used in the algorithm. They are less sensitive to outliers and can discover clusters of irregular shapes. They usually work with low dimensional data of numerically attributes, know as spatial data. Spatial objects could include not only points also extended objects

K-Mean Clustering Technique

The k-means clustering determine number of cluster k and we assume the center of these clusters can take any random object as the initial centroids. Then the k-means algorithm will do the three steps below until convergence. Iterate stable (=no object move group ):

1. determine the centroid coordinate 2. Determine the object based on minimum distance.

Distance calculation

The distance is the most commonly used of Euclidean distance. In most case when people said about distance they will refer to Euclidean distance. Euclidean distance or simply "distance examines the root of square difference between coordinates of a pair of objects.

Points a has coordinate (0, 3, 4, 5) and point B has coordinate (7, 6, 3, -1). The Euclidean Distance between point A and B is

Euclidian distance is a special case of Minkowski distance with

K-MEANS ALGORITHM

k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. A set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the observations into k sets (k ≤ n) S = {S1, S2, …, Sk} so as to minimize the withincluster sum of squares

where μi is the mean of points in Si. Classification - Tree Induction Results

The Tree Induction algorithm used to predict students’ potential performance was focused on using student assessment and e-Learning usage. Pessimistic pruning was applied to insure that the expected confidence levels obtained for predictions on the training data are similar to actual confidence levels obtained from unseen data. A set of rules was generated from the decision tree operator, showing interesting information about the supervised classification of the students. These rules classify at least three main categories of students: students with a low number of assignments are classified as FAIL; students with medium number of assignments quizzes are classified as FAIL or PASS depending on their quiz score, and students with a high number of assignments are classified as FAIL, PASS or EXCELLENT depending on number of actions in the e- Learning system etc. The rules generated from the Tree Induction Algorithm operator.

K-Means Results

The educator can use the cluster centroid K-Means results as presented in order to group Students into three types of students: very active students (cluster 0), active students (cluster 1) and nonnative students (2). This information helps educator to group students for working together in Collaborative activities. Students were divided in 3 groups based on their activities done in e-Learning Cluster 0 is characterized by most active students in e-Learning, with high assignment number, which participated to the online quiz and have a moderately number of discussion and forum read; Cluster 1 is characterized by moderately active students in e-Learning, with moderately assignment number, who participated or not to the online quiz in e-Learning with a low number of actions in the system.

Implementation of k-means

The k-means only allow numerical values for attributes in case it may be necessary to convert the data set into the standard spreadsheet format and convert categorical attributes to binary. It may also be necessary to normalize the value of attributes that are measured on substantially different scales. This K-mean algorithm automatically handles a mixture of categorical and numerical attributes. Furthermore the algorithm automatically normalizes numerical attributes when doing distance computations. The simple k-mean algorithm uses Euclidean distance measure to compute distance between instance and clusters. To perform clustering select the cluster and clustering algorithm In this case we select ;simple K-means te value is used to generate a random number which in tirn used for making the initial assignment of instance to cluster. In general k-means is quite sensitive to how cluster are initially assigned. of instance to cluster. Once the option have been specified , we can run the clustering algorithm. The result show the centriod of each cluster as well as statistics on the number and percentage of instance assigned to different cluster. The centroids can be used to characterize the cluster in through visualization cluster assignment. We can choose the cluster number and any of the other attributes for each of the three different dimensions available.

Fuzzy Logic-Based Methods

These methods have only recently taken their first steps in the e-learning field. For example in, a Neurofuzzy model for the evaluation of students in an intelligent tutoring system (ITS) was presented. Fuzzy theory was used to measure and transform the interaction between the student and the ITS into linguistic terms. Then, Artificial Neural Networks were trained to realize fuzzy relations operated with the max– min composition. These fuzzy relations represent the estimation made by human tutors of the degree of association between an observed response and a student characteristic. A fuzzy group-decision approach to assist users and domain experts in the evaluation of educational Web sites was realized in the EWSE system, presented in . In further work by Hwang and colleagues , a fuzzy rules-based method for eliciting and integrating system management knowledge was proposed and served as the basis for the design of an intelligent management system for monitoring educational Web servers. This system is capable of redicting and handling possible failures of educational Web servers, improving their stability and reliability. It assists students’self-assessment and provides them with suggestions based on fuzzy reasoning techniques. A two-phase fuzzy mining and learning algorithm was described in . It integrates an association rule mining algorithm, called Apriori, with fuzzy set theory to find embedded information that could be fed back to teachers for refining or reorganizing the teaching materials and tests. In a second phase, it uses an inductive learningalgorithm of the AQ family: AQR, to find the concept descriptions indicating the missing concepts during students’ learning. The results of this phase could also be fed back to teachers for refining or reorganizing the learning path.

The proposed applied data mining system

DISCUSSION

Several data mining techniques such as: Attribute Clustering (K-Means), Classification, Association Mining (Apriori, FPGrowth, Create Association Rule, GSP) were applied to e-Learning summarization tables. By applying clustering methods, the goal was to split data set in groups of data points that naturally group together. Student actions were clustered together in order to investigate patterns of students behavior in the e-Learning System. KMeans algorithm was used to define clusters, which starts with no prior knowledge about groups in the data. By using prediction techniques, the goal was to develop a model that can infer predicted variables from predictors’ variables. Inductive Decision tree algorithm was selected as classification method. The predicted variable was the categorical variable Final Mark. The target was to define variables that significantly affect in the Final Mark. By applying ARM, the goal was to discover relationships between variables. FPGrowth, Create Association Rule and APRIORI algorithms were selected as association rule mining techniques. The rules obtained can be explained inthe form that if some set of variable values is found, another variable will have a great chance to have a specific value.

CONCLUSIONS

In this research, a data mining model for e-Learning data was proposed based on several techniques Attribute Clustering (KMeans),Classification , Association Mining (Apriori, FPGrowth, Create Association Rule, GSP) was proposed. This educational data mining work allowed identifying and locating information about E-learning processes that need improvements or those that perform very well and could be used as good examples. The educational data mining investigated in this research allows analyzing and better understanding the learning and teaching processes by applying data mining techniques. The experimental results have shown that the data mining model presented was able to obtain comprehensible, actionable and logical feedback from the LMS data describing students’ learning behavior patterns. This work concentrated on the overall LMS performance at Epoka University and the mining Process of e-Learning data. Mining the e-Learning data allowed identifying the most effective ways to the teaching process that can be used to enhance the education process. To further test the effectiveness of the proposed model and to increase the generality of this research, more extensive experiments should be conducted by using larger LMS data sets. A reference list MUST be included using the following information as a guide. Only cited text references are included. Each reference is referred to in the text by a number enclosed in a square bracket . References must be numbered and ordered according to where they are first mentioned in the paper, NOT alphabetically.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now