Empirical Support For Weighted Majority

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Ms. Parneeta Sidhu, Mr. Aditya Bindal, Mr. Hunny Chugh, Dr M.P.S. Bhatia

Netaji Subhas Institute of Technology,

Azad Hind Fauj Marg,Sector-3,Dwarka, New Delhi-110075

[email protected], Mob:09871425044

Abstract : This paper describes experimental results on using Weighted Majority, EDDM and DWM based algorithms on datasets that contain different types of concept drift. These three algorithms have been highly studied in the theoretical machine learning literature. Here, we will prove that these algorithms can be quite competitive practically, and can improve the accuracy and speed in handling and identifying drifts in data. We have also discussed theoretical bounds for the weight reduction for an ensemble in case of misclassification in DWM. Based on state of the art knowledge and experimental results, we would also identify the open issues in these areas.

Keywords: Weighted Majority, Dynamic Weighted Majority, Early Drift Detection Method, Data Stream Mining, Machine Learning

Introduction

Many incremental and online algorithms have been developed in the history to handle the change of concept in data amounting to millions of records. Multiplicative weight updating algorithms such as Weighted Majority variants ( DeSantis et al., 1988; Littlestone and Warmuth, 1994; CesaBianchi et al.,1993) and Dynamic Weighted Majority (Kolter and Maloof 2003, 2007 ) have been studied extensively in the theoretical machine learning literature, in which a collection of strong properties of each of these algorithms have been proven. EDDM(Baena-Garcia et al.,2006 ) , a variant of DDM (Gama et al. 2004) has been studied that re-developed a model after a drift has been detected and was introduced to handle slow gradual changes in the concept underlying the distribution. All these algorithms could provide good results in the face of irrelevant features, noise or any change of target function with time. In this paper , we add evidence of the importance of these algorithms to handle feature change or class conditional change in terms of improvement in accuracy and speed.

The contribution of our paper is three fold. First we will show how Weighted Majority, EDDM and DWM can be applied to data sets that contain variations in type of drift from slow gradual change to abrupt drifts, drifts with noise or those that contain irrelevant features. Secondly, we will calculate the upper and the lower bounds for the reducing weights when DWM misclassifies a new example. Thirdly, we will list the various open issues for these existing algorithms to make them more time and space efficient.

Background Knowledge

From the historical survey of machine learning literature, we have basically two broad categorization of concept drifting algorithms: Incremental approach and Online approach. Acc to Polikar et al. 2001 , incremental algorithm preserve the earlier acquired useful knowledge for further classification and hence could be effective in case of recurrent drifts. They also accommodate new classes that may be introduced with new data and hence can handle new concepts really well. However, they require large data chunks for analyzing the new concepts so as to train well the existing classifiers and it is a real tough task to identify an appropriate value for the chunk size in order to achieve good performance measures . They can store chunks of data for offline processing and so take more time and memory for updating the classifiers as against the need for real time processing. These algorithms are also inefficient as they show non-continuity in handling drifts. In our paper, we discuss the Weighted Majority algorithm ( Littlestone and Warmuth, 1994), which is a very good Incremental algorithm that has been developed to handle change of context.

On the other hand , we have an Online algorithmic approach to study the changing distribution underlying the data stream. Online learning[11] is concerned with learning each data example separately as the system operates (usually in real-time), and the data may exist only for a short time . After observing each data example, the system makes changes in its structure to optimize the goal function .There are mainly two types of Online Learning approaches: approaches which use a mechanism to detect drifts and approaches which do not explicitly detect drifts .The former approach of drift detection uses some distribution measure that is correlated to the accuracy of the classifier such as mean, median or standard deviation. The algorithms that follow this type of approach respond quickly to drifts but suffer from non-accurate drift detections . These algorithms rebuild the model and discard the earlier system, hence cannot be good in handling recurrent drift detections.

The latter approach of Online Algorithms maintain ensemble of classifiers which remember the past history of concept descriptions . Weights are associated to each ensemble member based on its accuracy in classifying the new instance , pruning weak experts and adding newly learnt experts as well as training of the existing classifiers. However, these approaches take longer time to recover from drifts. Theoretically, they have been believed to be better in handling recurrent drift detections. In our paper we will experimental compare EDDM( Online approach that explicitly use some measure related to accuracy) and DWM ( Online approach that does not explicitly handle drifts).Our paper would practically prove each of these features of the algorithms and perform a comparative analysis of the three approaches.

Description of the Algorithms

Weighted Majority Algorithm( Incremental Algorithm, for irrelevant attributes)

We have a set of experts ,each representing a permutation of either a pair or triplet of features. The weighted majority algorithm( Littlestone and Warmuth,1994) believes that not all features are necessary to make a prediction rather a subset of features would suffice our purpose. The instances may contain many irrelevant features which does not contribute to any prediction but rather increase the time and memory usage for any algorithm. In this algorithm , we assign a positive weight to each of the experts of the ensembles. When an expert makes an incorrect prediction its weight is decreased by a factor β, which automatically gives a higher relative weight to an expert that made a correct prediction. Based on the maximum of weights of experts(ε0) that predicted 0 as the binary output and the weights of experts (ε1 ) that predicted one , the global algorithm makes weighted majority voting to make a final prediction. The global algorithm also rectifies the predictions of the ensemble members that predicted incorrectly and this training continues for every new instance.

A natural modification to the weighted majority algorithm is to discard the experts if their weights are very low, so as to speed up the algorithm as it learns more, in contrast to most learning algorithms that slow down as they learn. Discarding , may make it difficult to maintain those experts that could not predict correctly in the earlier phase but started giving correct predictions in the later stage. As a result, the algorithm would train and learn those experts which are not technically correct ,further dropping the accuracy and speed of learning.

Early Drift Detection Method( Algorithms that explicitly handles drifts)

EDDM was developed to handle very slow gradual changes apart from handling data streams that contain gradual and abrupt drifts. It is based on the distance between classification errors based on two accuracy measures ( average distance between two errors (pi )and the standard deviation(si )) . The algorithm is based on the number of examples between two classification errors as against DDM( Gama et al., 2004), which is based on number of error classifications. If this distance , is large this means, the concept is stable and the new instances belong to the same earlier target concept. However, a decrease in the classification error being measured by

V= (pi’ + 2.si’)/(p’max+2.s’max),where p’max+2. s’max,

corresponds to the point where the distribution of distances between errors is maximum. The author defines two levels:

-a warning level(α, 0<α<1 ) which predicts of a possible context change when V<α and

-the drift level (β, 0<β<1), at which drift is supposed to be true when V<β,the model is reset using examples stored since the warning level.

The max values for p’ and s’ are also reset.

EDDM approach for drift detection completely forgets the earlier history of predictions and the model is reset everytime the drift level is reached. As a result the approach could not be found suitable for recurrent drifts or predictable drifts . However if p’ and s’ ,keep on decreasing at every time step the model would take minimum of 1/α time steps to reach the warning level and 1/β to reach the drift level. Hence, a model is kept for a minimum of 1/β time steps before it is reset. As a result , EDDM can very well handle and perform learning in case of slow gradual changes.

Dynamic Weighted Majority (Algorithm that does not explicitly handle drifts)

DWM is an online learner consisting of ensemble of experts to make prediction for any new instance. It maintains a weighted pool of experts, each with an initial weight of one. When an instance arrives , the experts make a prediction based on previous experience. If an expert predicts incorrectly, its weight is decreased by a multiplicative factor β. Regardless of the correctness of prediction, DWM uses each learner’s prediction and its weight to compute a weighted sum for each class. The class with the highest weight is set as the global prediction. If after multiple in-corrections in results of global predictions , an experts weight falls below a threshold θ, DWM removes the expert from the ensemble and introduces a new expert with an initial weight of one. A factor p, that defines the period over which no weights can be updated and no expert can be removed or created makes DWM suitable for large and noisy data sets. However, the training of all the experts is a continuous process which happens throughout the flow of data streams.

In DWM , the weights of experts are only decreased and no change in weights is done whenever there is a correct prediction. DWM therefore normalizes the weights of experts by scaling them uniformly so that, after the transformation is complete, the maximum weight observed is only one. This would prevent a newly added expert from dominating the predictions.

The Learning Problem

For our experimental proofs we will be working on the four datasets which exhibit different types of drifts. We would compare our results of EDDM, DWM and Weighted Majority on the fore mentioned datasets:

STAGGER Concepts : Abrupt Concept drift, noise free examples

SEA Datasets: Large data set, Abrupt concept drift with Noise

Rotating Hyperplane dataset : Gradual drift with Noise

Sine1g : Very slow gradual drift without noise

STAGGER Concepts (Schlimmer and Granger, 1986)

STAGGER concepts provide a standard benchmark for evaluating a learner’s performance and would help us to generate empirical support for the various above mentioned concept drifting algorithms.

Each example in a STAGGER Concept consists of three attribute values:

color ε {green, blue, red},

shape ε {triangle, circle, rectangle}, and

size ε {small, medium, large}.

The presentation of training examples lasts for 300 time steps, and at each time step, the learner receives one example. In this data set we are evaluating a learner based on maximum of a pair of features and in each context at least one of the features is irrelevant.

In the first context (first 40 time steps), only the examples satisfying the description size=small ^ color=red are classified as positive. In this context, only size and color are the relevant attributes and shape is an irrelevant attribute.

In the second context(next 40 time steps), the concept description is defined by two relevant attributes, color=green Ù§ shape=circle, size forms an irrelevant attribute .

With the third context(last 40 time steps), the examples are classified as positive if size=medium Ù§ size= large. In this concept of STAGGER both shape and color form the irrelevant feature set.

To evaluate the drift detection algorithm at each time step, we randomly generate 100 examples of the current target concept, presents these to the performance element, and computes the percentage correctly predicted. In our experiments, we repeated this procedure 30 times and averaged the correct predictions over these runs.

SEA Concepts (Street and Kim , 2001)

The SEA concepts, provide a benchmark of a very large dataset for concept drift to evaluate the predictive accuracy of the various concept drifting algorithms. Each example in the SEA concepts consists of three real-valued attributes , xi ε R such that 0.0 ≤ xi ≤ 10.0.The target concept is y =[[x0 + x1 ≤ θ]],where θ ε{ 7,8,9,9.5} for the four data blocks. Each example belongs to class 1 if x0 + x1 ≤ θ and class 0 otherwise. Thus, only the first two attributes are relevant and the third attribute is irrelevant.

The presentation of training examples lasts for 50,000 time steps. For the first fourth (i.e.12,500 time steps), the target concept is with b = 8. For the second, b = 9; the third, b = 7; and the last concept with, b = 9.5. For each of these four periods, we randomly generated a training set consisting of 10,000 examples. At each time step, we presented each method with one example, tested the resulting concept descriptions using the examples in the test set, and computed the percentage of correct predictions. We repeated this procedure 30 times, averaging accuracy over these runs. We also computed 95% confidence intervals.

Moving HYPERPLANE (Hulten et al. 2001)

To provide empirical support for the performance of EDDM, DWM and Weighted Majority in an environment Hyperplane was used that includes drifting concepts along-with noise and gradual changes.

A hyperplane in d-dimensional space is the set of points x, xε [0,1]d that satisfy Σi=1d ai xi= a0 where xi is the ith coordinate of x and ai , the weights of the moving hyperplane in each dimension i .The examples are classified according to the following constraints :

Σi=1d ai xi ≥ a0 for positive examples

Σi=1d ai xi < a0 for negative examples

Hyperplanes are considered useful for simulating drifting concepts because we can easily change their orientation and position by simply changing their relative weights. We introduce change to this dataset adding drift to each weight attribute ai = ai +μiα, where α is the probability that the direction of change is reversed and μi is the direction of change for each weight . The threshold a0 is calcul-ated as a0 = ½ Σi=1d ai at each time, so that roughly half of the examples are positive. Sudden and significant changes were introduced by reversing the sign of inequality in constraint (1) every N examples. Then, noise was introduced by switching the labels of 5% of the training examples.

At each time step, an online learning system was trained with one example and then was tested with 100 examples generated randomly according to the current concept. The total number of training examples was 3000.

Sine1g Dataset

There were many real world problems with slow gradual drifts. However, to compare the performance of the various concept drifting algorithms on this type of drift we are using an artificial dataset, Sine1g, which will help us in our cause. The attributes in Sine1 dataset have values uniformly distributed in [0,1]. Classification is positive in the first concept, if a point lies below the curve y = sin(x) else it is negative . After the drift , the classification is reversed.Sine1g dataset remains the same as Sine1 (containing abrupt drift ), but the concept drift is made by gradually selecting examples from the old and the new concept. So, there is a transition time between concepts.

At each time step, an online learning system was trained with one example and then was tested with 100 examples generated randomly according to the current concept. The total number of training examples was 500.

Empirical Study and Results

We ran the three online ensemble algorithms EDDM (Baena-Garcia et al.,2006),DWM (Kolter and Maloof 2003, 2007 )and Blum’s implementation of Weighted Majority(1997) , on STAGGER Concepts, SEA Concepts, HYPERPLANE, and SINE1g data sets. To evaluate the learner( using pairs or singleton of features as experts), at each time step, one randomly generates 100 examples of the current target concept, presents these to the performance element, and computes the percent correctly predicted. In our experiments, we repeated this process 30 times and averaged the accuracies over these runs. Experiments were performed in Massive Online Analysis software (Bifet et al. 2010) where the code of DWM was explicitly added by us to get the experimental proofs. MOA is an open source framework for data stream mining applications.

In case of STAGGER concepts the feature set { shape, size, color} forms experts made of a singleton { size} and pair { color,size} and {color,shape} as experts. We compared DWM-NB , EDDM-NB and WM-NB by using the same base classifier Naïve Bayes so that the accuracy comparison is due to the concept drifting algorithm and not because of any other variation of base classifier. We set the weighted-majority learners—DWM-NB and WM-NB with pairs of features as experts—to halve an expert’s weight when it made a mistake ( β= 0.5). For DWM, we set it to update its weights and create and remove experts every fifty time steps (i.e., p = 50). The algorithm removed experts when their weights fell below 0.01 (θ= 0.01). Pilot studies indicated that these were the near-optimal settings for p and k ; varying β affected the performance very little; the selected value for θ did not affect accuracy, but reduced considerably the number of experts. If θ was not very small, the experts reached the threshold very early and needed to be removed very frequently, so the number of experts created would be more as compared to the case when the value of θ was exponentially small. However, a change in the value of p made a great change in the value of accuracy.

For EDDM, the values of α and β have been set to 0.95 and 0.90, respectively as per experimental evaluations these were the best values to give the best case evaluation. For WM majority algorithm , NB was the learner, the values for β and θ were set to 0.5 and 0.01 resp. and the pruning of the experts were allowed when the threshold θ was reached. The value of k was set to 1 so that each expert maintained a history of only its last prediction.

In the graphs as shown in Fig. 1 , i.e. in the case of STAGGER concepts, DWM and EDDM (as shown in blue ) completely overlap each other for every time step.In STAGGER concepts both DWM and EDDM performed similarly as shown by overlapping of graphs. However, In the first target concept, WM outperformed DWM and EDDM , performed comparably on the second and performed worse on the third taget concept. So, within an interval of 120 instances , the concept has changed three times abruptly with EDDM and DWM showing similar curves throughout the learning and training of experts. On datasets, with sudden concept change, both the algorithms reacted quickly and reach low error rates. EDDM and DWM performed similarly on each of the target concepts in terms of asymptote and slope. We can see improvement in accuracy with learning and achieved 95% confidence interval.

To the best of our knowledge, ours was the only implementation of EDDM on STAGGER Concepts.

C:\Users\Admin\Desktop\Experimental results\Paint Brush\EDDM_WM_DWM_STAGGER.pngFig 1. DWM, EDDM and WM on STAGGER Concepts that show abrupt drifts with 95% confidence interval.

The larger dips correspond to sudden change in distribution. EDDM and DWM overlap to show a

single blue line.

Sine1g

C:\Users\Admin\Desktop\Experimental results\Paint Brush\EDDM_WM_DWM_Sine1g.png Fig 2. DWM, EDDM and WM on Sine1g Dataset that shows very slow gradual drifts with 88%

confidence interval.

There are many real-world problems that have slow gradual change. In this section we use the Sine1g dataset to illustrate how EDDM , DWM and WM works with this kind of change. Fig. 2 shows the prequential accuracy curves obtained when EDDM, DWM and WM deal with this dataset. The plots are prequential accuracy calculated with the 500 examples of the given dataset.

The local curves show that , initially the accuracy of prediction of all the three algorithms is the same but as learning improves WM algorithm show the best accuracy in handling slow gradual changes as it can easily accommodate the new classes originating from new instances. EDDM reacts before and more severely than DWM and WM to slow gradual drifts as shown by a large drop in accuracy, at a time step when accuracies of WM and DWM illustrated a slight change. However, DWM reacts more number of times as compared to EDDM and WM because the frequency of classification errors continuously increases until the next concept is stable. Meanwhile, DWM shows less sensitivity to these problems reacting later than EDDM.

Ours is the known first implementation of WM and DWM on Sine1g dataset.

HYPERPLANE

C:\Users\Admin\Desktop\Experimental results\Paint Brush\EDDM_WM_DWM_Hyperplane.pngFig. 3 DWM (p=50),EDDM and WM on rotating Hyperplane problem. Results were averaged over

1000 trials.

At each time step, an online learning system was trained with one example and then was tested with 100 examples generated randomly according to the current concept. The total number of training examples was 3000. Then, noise was introduced by switching the labels of 5% of the training examples. The parameters for this problem were d = 10, t = 0.3, and N = 3000.

As illustrated in Fig.3, the predictive accuracies of WM-NB is similar to that of EDDM-NB as shown by overlapping of both the curves throughout the process of training and prediction. DWM-NB reacted to gradual changes very early shown by a downfall in accuracy earlier than WM and EDDM but this also worsened their predictive accuracy. The rate of false alarms is higher in case of DWM than either of the other two learning algorithms.

By using the first 1200 examples from the Moving Hyperplane problem, we investigated their false alarm rates and their numbers of required examples for detecting the first change over 1000 trials. Misdetection rates for all of the methods were 0. Note that "false alarm" means that the change is detected before time 1000 and "misdetection" means that the change is not detected until time 1200 in this experiment .WM and EDDM was therefore not able to detect the sudden and significant change at all. Illustration shows that DWM-NB was able to respond to the sudden and significant change quickly and accurately even in an environment that also includes gradual changes and noise.

This is the first time that DWM,EDDM and WM have been implemented on the rotating Hyperplane problem.

SEA Dataset:

C:\Users\Admin\Desktop\Experimental results\Paint Brush\EDDM_WM_DWM_SEA-p1.png C:\Users\Admin\Desktop\Experimental results\Paint Brush\EDDM_WM_DWM_SEA-p50.png

(b)

Fig. 4 (a) DWM (p=1),EDDM and WM on SEA Concepts with 10% class noise. (b) DWM (p=50),EDDM and WM on SEA Concepts with 10% class noise. Results were

averaged over 30 trials.

For the first fourth (i.e., 12,500 time steps), the target concept is with b = 8. For the second, b = 9; the third, b = 7; and the fourth, b = 9.5. We randomly generated a training set consisting of 10,000 examples. In our experimental analysis, we added 10% class noise. We also randomly generated 2,500 examples for testing. At each time step, we presented each method with one example, tested the resulting concept descriptions using the examples in the test set, and computed the percent correct. We repeated this procedure 30 times, averaging accuracy over these runs.

On this problem, we evaluated DWM-NB, EDDM-NB and WM-NB . We set DWM-NB to halve the expert weights (i.e., β = 0.5) and to update these weights and to create and remove experts at every single time steps (i.e., p =1), as shown in Fig. 4(a).As illustrated in Fig. 4 (a) and 4(b), in the first ,second and third target concept, EDDM and WM achieved similar predictive accuracies as their slope and asymptote is almost the same. However, in the fourth target concept the accuracies for EDDM is lesser than that of WM. EDDM reacts more than WM when there is a change of context. WM reaches the new target concept more quickly with better accuracy than EDDM.

Regarding the accuracy of DWM, the value of p=1, degrades its performance a lot because at every new time step, for an incorrect global classification, the weights of experts (that predicted incorrectly) are reduced to half and so they reach the threshold very early. As a result, experts are not getting enough time to get trained and give correct predictions for new incoming instances. The rate of creation and removal of experts in case of DWM is very high as compared to EDDM and WM.

However, in Fig.4(b) as can be seen the value of p=50 has greatly improved the performance accuracy of DWM. DWM creates an expert when it misclassifies an example. In the noisy condition as above with SEA, since 10% of the examples had been relabeled, DWM-NB made more mistakes and therefore created more experts as compared to EDDM , which re-learnt the model beyond the drift level. So, DWM reacts more and large number of times as compared to EDDM and WM. As a result, the memory utilization is also affected and performance time also increases since DWM trains and queries experts when a new example is introduced. DWM converged more quickly to the target concept with 88% confidence intervals.

This is the first time EDDM and WM has been implemented on SEA concepts.

Weight reduction bounds for DWM

DWM maintains a weighted dynamic pool of experts with no limit on the number of experts. Whenever, the prediction is incorrect, the weight of the ensemble that predicts incorrectly is reduced by a factor β. If an experts’ weight falls below the threshold θ, that expert is removed from the ensemble and a new expert with an initial weight of one is introduced which undergoes continuous learning and training to make correct predictions for the new in coming examples. The class with the maximum weight is chosen as the correct prediction for the new instance. Here, we define some upper and lower bounds for the maximum and minimum reduction in weights of the experts, respectively.

Theorem 1 An ensemble of classifiers to classify instances with binary labels, each with an equal initial weight of WIN,j,0 , jε {1,2,3…n}, n being the number of experts in the ensemble. A new instance is introduced after each time step. The maximum drop in weight maxd after m trials in which ensemble makes mistakes is atmost βm , where 0≤β<1 is the reduction factor in case of misclassification.Assuming m < p , the period which limits the expert removal or creation.

Proof

For the upper bound for reduction in weight we assume that all the experts in the ensemble misclassify all the instances for the m time steps (which is practically infeasible).The weight of an expert in the start is w. So,

WTot,j,0= WIN,,j,0 =n w , WTot,j,0 is the total weight of ensemble with j classifiers at beginning

In the first trial if the first expert makes a mistake its weight is reduced to βw. This happens further for all the experts so their weights are also reduced by the same factor. So, the total weight after all the misclassifications in the first time step is

WTot,j,1= βw+ βw+ βw+….n terms= n βw

The process of misclassifications continues till m time steps such that WTot,j,m= n βmw

Therefore, the maximum drop in weights after m trials each with incorrect predictions ,

maxd= = =

Theorem 2 As an extension to Theorem 1, if θ is the weight threshold after which experts can be removed then

m≤

Proof. A new expert is not created until DWM satisfies the two conditions

weight of an expert ≤ θ and

the number of time steps since last removal = p, the period which limits the expert removal or creation.

So with ref. to cond. 1 , the weight after final reduction

WTot,j,m ≤ θ

From Theorem 1. βmw ≤ θ

Taking log of both sides m log (β) + log (w) ≤ log(θ)

Therefore , m≤

Theorem 3 An ensemble of classifiers to classify instances with binary labels, each with an equal initial weight of WIN,j,0 , jε {1,2,3…n}, n being the number of experts in the ensemble. A new instance is introduced after each time step. The minimum drop in weight mind after m trials is given by

+1

, where 0≤β<1 is the reduction factor in case of misclassification. Assuming m < p , the period which limits the expert removal or creation. Assuming at least one misclassification has to happen with each new time step.

Proof.

Lower bound for weight reduction is achieved when the same expert makes incorrect predictions and all other experts are good classifiers before any expert removal or creation happens. The weight of an expert in the start is w. So,

WTot,j,0= WIN,,j,0 =n w , WTot,j,0 is the total weight of ensemble with j classifiers at beginning

In the first trial if an expert makes a mistake its weight is reduced to βw, while all the other n-1 experts give correct prediction. So, the total weight after the first time step is

WTot,j,1= βw+ w+ w+….(n-1) terms= βw +(n-1)w

To get a lower bound on the weight drop, in the next time step, the same expert makes a misclassification, So, the total weight after the second time step is

WTot,j,2= β2w+ w+ w+….(n-1) terms= β2w +(n-1)w.

This process continues and the total weight after the m time steps is

WTot,j,m= βmw+ w+ w+….(n-1) terms= βmw +(n-1)w.

Therefore, mind = WTot,j,m / WTot,j,0 = == +1

Open Research Issues

After the in-depth study of our literature and experimental work in the field of ensemble classifiers used for handling concept drift, some issues that need to be addressed are:

EDDM (Baena-Garci’a et al. 2006) have been implemented as a wrapper for learning algorithm but still needs to be implemented inside any online and incremental learning algorithm.

A method needs to be developed to automatically assign values to the two parameters used by EDDM , α and β, which has been experimentally fixed till date.

In DWM (Kolter et al. 2003), one can use any other decision-tree as the base algorithm, that does not maintain earlier examples and does not periodically perform restructuring , such as VFDT (Domingos et al. 2000) and CVFDT (Hulten et al. 2001).

DWM (Kolter et al. 2003) can also take into account expert’s age or its history of predictions, when creating or deleting experts, which may improve accuracy and memory utilization. I will be discussing this work in my next research paper, which is under improvement.

There is no mechanism in DWM to explicitly handle noise or to determine when examples are likely to be from a different concept.

The rate of false alarms is very high in case of DWM. Work needs to be done to improve the accuracy of DWM.

No work has been done to seek the relationship between a learner’s stability and its ability to handle drifts. If a learner is stable for very long, we can say that the learner is the general trend for that data stream.

We can manipulate DWM to handle re-current drifts (where very less work has been done in the literature) by introducing the use of memories.

Study of literature has shown that no work has been done to handle predictable and non-periodic sequential drifts. So a lot of scope for research work is possible in these drift areas.

One can design a careful strategy that benefits more from the high diversity ensemble before drift detection, which in the history has just been used for learning and not for prediction.

The use of memories for dealing with recurrent or predictable drifts is proposed as future work.

It is still to be explored that whether the difficulty of the concept learnt before a drift influences the learning of the new concept.

The study of other features of the ensembles, such as ensemble size, base learners’ individual accuracy and feature selection, may be explored to design new improvised algorithms for drift handling.

Another direction would be to investigate the use of mutation to deal with concept drifts.

For handling severe and sudden drifts, new ways to assign weights to the ensembles could also be investigated, in order to improve accuracy soon after drift detection.

Future work could be to design a drift detection method that can identify the type of drift and then use the most appropriate strategy for handling it.

Lot of scope for research is possible to perform learning of training examples that may contain many irrelevant attributes, using the ensembles of higher diversity.

Others measures for measuring diversity (other than Q-Statistic) can still be explored, which may give better diversity measurement in lesser time and with lesser complexity.

Summary and Conclusions

If the learning setting has the property that one can hope to predict well by using a collection of simple rules, these algorithms have advantages of being fast ,of being able to quickly focus on relevant features ,of adapting well to target concepts that change with time, and having reasonable accuracy.Experimental results have supported the claim that ensemble methods with weighting mechanisms , like those present in DWM will converge more quickly to the target concepts than the methods that replace unweighted learners in the ensemble. As can be seen from experiments weighted majority algorithm speeds up as it learns more, in contrast to most learning algorithms that slow down as they learn. Experimental evaluation of DWM on a larger data set SEA proves that DWM reacts more and larger number of times when dealing with noisy examples. Results on Hyperplane dataset shows that , DWM responds to the sudden and significant change quickly and accurately in an environment that also includes gradual changes and noise.When implemented on Sine1g , EDDM reacts before and more severely than DWM and WM and is thus the best algorithm to handle very slow gradual drifts.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now