Related Work On Mining Positive

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Chapter 2

The association rule mining task can be stated as: Let I be a set of items, and T be a database of transactions, where each transaction has a unique identifier (tid) and contains a set of items. A set of items also can be called as an itemset. The support of an itemset X, which is denoted by σ (X), is the number of transactions that X contains. An itemset is frequent if its support is greater than a user-specified minimum support (min sup) value. The association rule is an implication of the form A  B, where A, B ⊆ I and A ∩ B = Φ. The support of the rule is given as σ (A ∪ B), and the confidence as σ (A ∪ B)/ σ (A) (i.e., the conditional probability that a transaction contains B, given that it contains A). The mining task consists of two steps: 1) Find all frequent itemsets. 2) Generate rules which satisfies minimum confidence. This step is relatively straightforward. Therefore, the complexity of an algorithm mainly depends on the complexity of step 1.

2.1.1 Brute-Force Approach

In this approach, first, list all possible association rules. Compute the support and the confidence for each rule. Prune the rules that do not satisfy the minsup and minconf thresholds given by the user. For a given set of items d, the total number of possible association rules can be calculated by using the formula:

Empirically, for instance d=6 the possible association rules are R=602. The relationship between number of items and number of possible rules is as shown below.

Figure 2.1: Relationship between Number of Items (d) and Number of Possible Rules ( R )

From the Figure 2.1, it is evident that the listing of all possible association rules is exponential. Therefore, this approach would be computationally prohibitive.

2.1.2 Finding Frequent Itemsets Using Candidate Generation

Apriori algorithm is proposed for mining frequent itemsets for Boolean association rules [57]. The algorithm uses prior knowledge of frequent itemsets to generate itemsets for the next level. Generally Apriori algorithm consists of two steps, which are, candidate generation and finding frequent itemsets. Initially, it finds frequent 1-itemsets from the database and the result set is denoted as L1. L1 is used for generating the candidate 2-itemsets and then finds L2. L2 is used to generate candidate 3-itemsets and then finds F3. The similar process is repeated until no more frequent itemsets are found in the given dataset.

To improve the efficiency of the level-wise generation of frequent itemsets, we focus on an important property; all nonempty subsets of a frequent itemset must also be frequent, called Apriori Property which is used for reducing the search space.

To find Lk using Lk-1 is two step process consisting of join and prune actions.

The join step: In this step to find CK, a set of candidate k-itemsets are generated by joining LK-1 by itself. The join operation, LK-1 ⋈ LK-1, is performed, where members of LK-1 are joinable if their first (K-2) items are in common and K-1 items are different.

The prune step: This step eliminates the extensions of (K-1)-itemsets which are not found to be frequent by using Apriori property, from being considered for counting support.

Algorithm: Apriori (TDB, ms)

{

Find the set of all frequent 1-itemsets from TDB and is denoted by L1

for ( K = 2; LK-1 ≠ Φ ; K++)

{ /* Candidate Generation Phase*/

CK = LK-1 ⋈ LK-1

/* Pruning using Apriori property*/

for each (k-1)- subsets s of c in CK

{ if s ∉ LK-1 then

CK = CK – {c}

}

/* finding support counts of itemsets in CK */

Scan the database and find support for all c ∈ CK

LK = { c / c ∈ CK ∧ sup(c ) ≥ ms }

L = L U LK

}

}

2.1.3 Generalizing Association Rules to Correlations

The concept of negative relationships has been introduced by a group of researchers [63]. Statistical tests are employed to verify the independence between two variables. Correlation measure was used to find the relationship (Positive or Negative) between the variables range between -1 and 1. Chi-squared based model is proposed for finding the set of minimal correlated itemsets.

Algorithm: χ2-support (α, s, p, TDB)

{

for each item i∈ I, find support ( i )

for each pair of items i1, i2 ∈ I such that support(i1) > s & support (i2) > s then C = C ∪ {i1, i2}

NOTSIG = Φ

if (C = = Φ) return SIG

for each itemset ∈ C

construct the contingency table for the item set

if (cell count < p %)

go to Step 7

if ( χ2 table > χ2α )

SIG = SIG ∪ itemset

else

NOTSIG = NOTSIG ∪ itemset

Continue with the next item set in C. If there are no more itemsets in C, then set C to be the set of all sets S such that every subset of size |S| – 1 of S is in NOTSIG. go to Step 4.

}

Mining Strong Negative Associations in a Large Database of Customer Transactions

A variant approach to mine negative associations has been proposed in the form of taxonomy by combining frequent itemsets with domain knowledge [65]. As the algorithm is domain dependent and required predefined taxonomy, it is difficult to generalize.

To find the negative itemsets the steps involved are:

(1) Find all the generalized large itemsets in the data (i.e., itemsets at all levels in the taxonomy whose support is greater than the user specified minimum support)

(2) Identify the candidate negative itemsets based on the large itemsets and the taxonomy and assign them expected support.

(3) count the actual support for the candidate itemsets and retain only the negative itemsets .The interest measure RI of negative association rule X  ┐Y, given as RI=(E[support( X U Y )]-support( X U Y ))/support(X) Where E[support(X)] is the expected support of an itemset X.

Algorithm: NegativeCandidateGeneration (MinSup, MinRI)

{

L1 = {large 1-itemsets}

K = 2 /* K represents the pass number */

/* First generate all large itemsets */

while (LK-1 ≠ Φ)

{

CK = GenCand (LK-1)

for all transactions t ∈ TDB

{

Ct = subset(CK, t)

for all candidates c ∈ Ct

c.count++

}

LK = {c ∈ CK | c.count ≥ MinSup}

K = K +1

}

/* Now generate negative itemsets */

Delete all small 1-itemsets from the taxonomy

K = 2

While (LK ≠ Φ)

{

/* Generate negative candidates of size K */

NCK = GenNegCand (LK)

NC = NC ∪ NCK

K = K +1

}

for all transactions t ∈ TDB

{

NCt = subset(NCk, t)

for all candidates c ∈ NCt

c.count++

}

NK = {c ∈NCK|c.count < MinSup*MinRI}

}

Algorithm: RuleGeneration( nk)

{

for all negative itemsets nK of size K, K ≥ 2

{

H1 = {consequents of rules generated from nk with one item in the consequent}

call genrules (nK, H1, L2, LK-2)

}

}

Procedure genrules (nK, Hm, Lm+1, LK-m-1)

{

if (k > m+1)

{

Hm+1 = apriori-gen (Hm)

for all hm+1 ∈ Hm+1

{

if (hm+1 ∈ Lm+1)

{

if ((nk - hm+1) ∈ Lk-m-1)

{

RI = (Σ[sup(nk)] - sup(nk))/sup(nk - hm+1)

}

if (RI ≥ MinRI)

{ output rule (nK - hm+1) ⇏ hm+1 }

else

{ delete hm+1 from Hm+1 }

else

{ delete hm+1 from Hm+1 }

}

}

}

call genrules (nK, Hm+1)

}

Mining Substitution Rules for Statistically Dependent Items

Substitution Rule Mining (SRM) has been proposed to mine association rules from the given dataset [73]. Mining substitution rules involve two procedures such as finding concrete itemsets and substitution rule generation. To generate concrete itemsets, authors used a statistical measure called Chi-square test to measure the dependency among the items in the itemsets and to find the correlation between itemsets correlation coefficient is used.

Concrete itemsets are those possible itemsets which could be choices for customers with some purchasing purpose. To qualify an itemset as a concrete one, not only the purchasing frequency, i.e., support of an itemset, but also the dependency of items has to be examined to declare that these items are purposefully purchased together by customer.

Let X= {x1,x2,….xk} be a positive k-itemset, the chi-square value for the itemset X is computed as

Where n defines the number of total transactions, Y+ defines the positive itemset where all complement items in the itemset Y are replaced by their positive counterparts, such as {a, ┐b, ┐c}+= {a, b, c} where a, b, and c are positive items, and EI= is the expected support of I.

Algorithm: Apriori-Dual ( MinSup, MinConf )

{

Append the complement items whose positive counterpart is not originally present to each transaction

Generate the set of frequent (positive and negative) items, i.e., L1

Remove the negative items whose positive counterpart is not frequent from L1

for ( ; K ≥ 2 ; )

{

Generate the candidate set of K-itemsets from LK-1 i.e., CK = LK-1 ⋈ LK-1

if (CK = Φ) then break

Scan the transactions to calculate supports of all candidate K-itemsets

LK = { c ∈ CK | Sc ≥ MinSup}

}

/* procedure of negative association rule generation*/

for each negative itemset X in LK

{

Let ┐Y be the largest pure negative itemset that ┐Y⊂ X

if (X- ┐Y) is not an empty set // (X- ┐Y) is positive

if (Conf((X- ┐Y) ┐Y)≥ MinConf)

Output the rule (X- ┐Y) ┐Y

}

}

Algorithm: SRM ( MinSup, MinConf, ρmin )

{

L1 = The set of all positive frequent itemsets

set of concrete itemsets = L1

for ( ; K≥2 ; )

{

Generate the candidate set of k-itemsets from LK-1 i.e., CK = LK-1 ⋈ LK-1

if (CK = Φ) then break

Scan the transactions to calculate supports of all candidate K-itemsets

LK = { c ∈ CK| Sc ≥ MinSup}

for each frequent itemset X in LK

{

if (SX > Πxi ∈X Sxi ) &&( Chi(X) ≥ χ2df (X), α )

Add X to the set of concrete itemsets

}

}

/* procedure of substitution rule generation*/

for each pair of concrete itemsets X, Y

{

if (ρ (X, Y) < -ρmin )

if (sup( X  ┐Y) ≥ MinSup) &&conf(( X  ┐Y) ≥ MinConf)

/* X  ┐Y is valid */

Output the substitution rule X ⊳ Y

}

}

Mining Negative Association Rules

A variant method for mining negative association rules has been proposed [82]. It consists of three steps:

(1) Find a set of positive rules.

(2)Generate negative rules based on existing positive rules and domain knowledge.

(3) Prune the redundant rules.

The support and confidence of LHS negative rule and RHS negative rule can be computed using the following formulae:

= (1- conf(YX)) (2.6)

Algorithm: MNAR ( minsup, minconf, TDB)

{

Find all valid positive frequent itemsets (F) and Positive Association Rules (PAR) using Apriori Algorithm

/* Generate Negative Rules*/

Delete all infrequent items from the taxonomy

for all rules r ∈ PAR

{

TRSs = GNC(r)

for all rules tr ∈ TRSs

{

if (SM(tr.conf, t.conf) > confDeviate)

{

if( supp(Neg(tr)) > minsup & conf(Neg(tr)) > minconf))

NAR = NAR ∪ Neg(tr)

}

/* Pruning*/

if all members of LOS have common itemset that form {r1, r2,….. rn} ⊆ Rule

{

delete rk, where rk falls in the categories

}

}

}

Mining Positive and Negative Association Rules

The most common frame-work in the association rule generation is the "Support-Confidence" one. A new frame-work called support-confidence-correlation has been proposed [40]. Mining frequent itemsets and generating strong association rules phases are combined. It has generated the appropriate rules while finding the correlations within candidate itemset and thus redundant item combinations eliminated. Indeed, for each generated candidate itemset, all possible combinations of items are computed to analyze their correlations. At the end, only those rules generated are kept from item combinations with strong correlation. If the correlation is positive, a positive rule is discovered. If the correlation is negative, two negative rules are discovered. The negative rules produced are of the form X  ┐Y or ┐X  Y which the authors term as "confined negative association rules". Here the entire antecedent or consequent is either a conjunction of negated attributes or a conjunction of non-negated attributes.

Algorithm: PNARG (ms, mc, ρmin , TDB)

{

if ρmin is undefined then ρmin = 0.5

Find F1 the set of frequent 1-itemsets

for (K = 2; LK-1 ≠ Φ; K + +)

{

CK = LK-1 ⋈ L1

for each i ∈ CK

{

s = support(TDB, i)

if (s ≥ ms) then LK = LK ∪ {i}

for each A,B where A ∪ B = i

{

ρ = correlation between A and B

if (ρ ≥ ρmin )

{

if (confidence(A  B ) ≥ mc ) then PAR= PAR ∪ {A B}

else if (supp(┐A  ┐B )≥ ms &&confidence(┐A  ┐B ) ≥ mc) then NAR = NAR ∪{┐A  ┐B}

}

if (ρ ≤ ρmin) /* ρ < 0 and | ρ |≥ ρ min */

{

if (confidence(X ┐Y )≥ mc) then NAR = NAR ∪{A┐B}

if (confidence(┐A  B ) ≥ mc ) then NAR = NAR ∪{┐AB}

}

}

}

}

AR = PAR ∪ NAR

if (AR = Φ)

{

ρmin = ρ min- 0.1

if ( ρmin ≥ 0)

call PNARG( ms, mc, ρmin , TDB)

}

return AR

}

Efficient Mining of Both Positive and Negative Association Rules

A novel approach has been proposed to mine positive and negative association rules [79]. In this, generating valid positive and negative association rules approach can be divided into the following:

Generate the set of frequent itemsets of interest (FI) and the set of infrequent itemsets of interest (IFI)

Extract positive rules of the form AB in FI, and negative rules of the forms A ┐ B, ┐ AB and ┐A┐ B in IFI.

To generate FI, IFI and negative association rules they developed three functions namely, fipi( ), iipis( ) and CPIR( ).

Algorithm: AllItemsetsofInterest(TDB, ms, mc, mi)

{

FI = Φ

IFI = Φ

L1 = set of frequent 1-itemsets

FI = FI ∪ L1

for (K = 2; LK−1 ≠ Φ; K ++)

{

CK = LK-1 ⋈ LK-1

LK = set of all frequent itemsets

NK = CK – LK

for each itemset i ∈ LK

{

if i is not frequent itemset of potential interest then LK = LK − {i }

FI = FI ∪ LK

}

for each itemset j ∈ NK

{

if j is not infrequent itemset of potential interest then NK = NK − {j}

IFI = IFI ∪ NK

}

}

return FI and IFI

}

The algorithm for extracting both positive and negative association rules with the probability ratio model for confidence checking is designed as follows:

Algorithm: PositiveAndNegativeAssociations ( TDB, ms, mc, mi)

{

for each frequent itemset i ∈ FI

{

for each A,B where A ∪ B = i

{

if( fipis(A, B ) )

if (CPIR(B |A ) ≥ mc ) then PAR = PAR ∪ { A  B }with confidence = CPIR(B |A ) and support= supp(i)

if (CPIR(A |B ) ≥ mc ) then PAR = PAR ∪ { B  A}with confidence = CPIR(A |B ) & support= supp(i)

}

}

/* Generate all negative association rules in IFI*/

for each itemset j ∈ IFI

{

for each A,B where A ∪ B = j

{

if( iipis(A, B ) )

{

if (CPIR(B |┐A ) ≥ mc) then NAR = NAR ∪ { ┐A  B } with confidence = CPIR(B |┐A ) & support =supp(┐A |B )

if (CPIR(┐A |B )| ≥ mc) then NAR = NAR ∪ { B ┐A } with confidence = CPIR(┐A |B ) & support = supp(B ∪┐A )

if (CPIR(┐B |A )| ≥ mc) then NAR = NAR ∪ { A ┐B } with confidence = CPIR(┐B |A ) & support = supp(A |┐B )

if (CPIR(A |┐B )| ≥ mc) then NAR = NAR ∪ { ┐B A} with confidence = CPIR(A |┐B ) & support = supp(┐B ∪ A )

if (CPIR(┐B |┐A )| ≥ mc ) then NAR = NAR ∪ { ┐A ┐B } with confidence = CPIR(┐B |┐A ) & support = supp(┐A |┐B )

if (CPIR(┐A |┐B )| ≥ mc ) then NAR = NAR ∪ { ┐B ┐A }with confidence = CPIR(┐A |┐B ) & support = supp(┐B ∪ ┐A )

}

}

}

}

Conditional-Probability Increment Ratio (CPIR) function for a pair of itemsets X and Y, denoted by CPIR as follows:

Mining Positive and Negative Association Rules from Large Databases

An innovative approach has proposed [11]. In this generating positive and negative association rules consists of four steps: (i) Generate all positive frequent itemsets L (P1) (ii) for all itemsets I in L (P1), generate negative frequent itemsets of the form ┐ ( I1 I2 ) (iii) Generate all negative frequent itemsets ┐ I1 ┐I2 (iv) Generate all negative frequent itemsets I1 ┐ I2 and (v) Generate all valid positive and negative association rules. Negative Rules have been generated without adding additional interesting measure(s) to support-confidence frame work.

Procedure PNAR ( )

{

Find positive frequent itemsets

Find negative frequent itemsets of the form ┐( XY)

Find negative frequent itemsets of the form ┐X ┐ Y

Find negative frequent itemsets of the form ┐X Y

Find positive and negative association rules for the items generated in steps 1-4

}

An Effective Algorithm for Mining Positive and Negative Association Rules

Mining positive and negative association rules approach consists of three sub problems [23]:

(1) Finding the frequent itemsets (PL) and infrequent itemsets (NL).

(2) From PL, find valid positive rules of the form A B.

(3) From NL, find valid negative rules of the forms A┐B and ┐A B.

Let TDB be a transactional database, and ms, mc, dms and dmc given by the user. The algorithm for extracting both positive and negative association rules with a correlation coefficient measure

CorrA,B= (2.8)

is designed as follows:

Algorithm: PNAR (TDB, ms, mc, dms, dmc )

{

positiveAR = Φ ; negativeAR = Φ /*positive and negative AR itemsets*/

PL= Φ ; NL= Φ

F1- the set of frequent 1-itemset

PL= PL∪L1

for (K = 2; LK-1 ≠ Φ ; K + +)

{

perform join operation on LK-1 to itself and let it be CK

for each i ∈ CK

{ S = supp( i )

if (S ≥ ms)

{ LK = LK ∪ {i}

PL = PL ∪ LK

}

else

{

NLK = NLK ∪ {i}

NL = NL ∪ NLK

}

}

}

for each frequent itemset i ∈ PL

{

for each expression A ∪ B = i and A ∩ B = φ

{

corrA,B = supp(A∪B) / ( supp(A) * supp(B))

if (corrA,B >1)

if (conf(A  B) ≥ mc )

positiveAR = positiveAR ∪ { A  B }

}

}

for each infrequent itemset i ∈ NL

{

for each expression A ∪ B = i and A ∩ B = φ

{

corrA,B = supp(A∪B) / ( supp(A) * supp(B))

if(corrA,B < 1)

if (supp(A ┐B ) ≥ dms && conf(A ┐B) ≥ dmc )

{ negativeAR = negativeAR ∪{ A ┐B } }

if(supp(┐A B ) ≥ dms && conf(┐A B) ≥ dmc)

{ negativeAR = negativeAR ∪{ ┐A B } }

}

}

AR = positiveAR ∪ negativeAR

ruturn AR

}

Related Work on Indirect Association Rule Mining

Indirect association and negative association are similar as both dealt with infrequent itemsets. A negative association rule discovers the set of items a customer will not likely to buy a certain set of other items. The most significant difference between negative associations and indirect associations is that a mediator is central to the concept of indirect associations.

2.2.1 Mining Higher Order Dependencies in Data

The first algorithm for mining indirect associations between pairs of items has been presented below [72].

Algorithm: INDIRECT (TDB, ts,td)

{

Find L, the set of all frequent itemsets using Apriori algorithm

P = Φ

for ( K = 2 ; K≤ n; K++)

{

CK+1 = join LK to itself

Consider any two items x, y ∈ CK+1 such that i ⊂ x, j ⊂ y and x ∩ y = M

for each < i, j , M > ∈ CK+1

{

if (sup({i , j}) < ts and dep({i}, M)≥ td and dep({j}, M) ≥ td)

{

P = P ∪ {< i , j , M> }

}

}

}

There are two major phases in this algorithm:

1. Find frequent itemsets using Apriori algorithm

2. Discover all indirect associations by performing Candidate generation and pruning

INDIRECT algorithm requires two join operations. The first one is in the Apriori algorithm in order to generate all frequent itemsets and the other one is in the generation of candidates for mining indirect associations. In general, join operation is quite expensive. The join operation in INDIRECT is more expensive than the join operation in the Apriori.

2.2.2 Indirect Association Mining

The proposed Indirect Association Mining (IAM) algorithm proceeds in four phases: an initialization phase, a pruning phase, a bridge itemset calculation phase, and a ranking phase. The purpose of the initialization phase is to allocate the memory needed. Initially, an itempair support value matrix M is constructed. Then it is updated by scanning the entire database by calling the function MatrixUpdateO. The second phase is a process of pruning for the purpose of minimizing the search space of problem. The threshold value of pruning is MinSup(s). The third phase, the Bridge Itemset Calculation Phase, is the most important for this algorithm. In this phase, A and B, which are may be indirectly associated, are firstly got by calling the function ItemMap().Then the bridge itemset can be calculated by calling the function BridgeFindO. And the bridge itemset X and two indirectly associated items A and B are saved and the closeness is inserted into a linked vector C for non empty itemset X. The last Phase, a ranking phase, is mainly to finish the ranking operation according to the closeness value in the linked vector C for the purpose of providing decision makers the most useful indirect association rules [35].

Algorithm: IndirectAssociationMining ( T, X, min-sup(s), min-sup(F) , min-con(F), min-dpd, M, mi, C)

{

/* Initialization Phase*/

M = malloc{[mi=O])

for each transaction t∈ T

{ MatrixUpdate(M,ti) }

/*Pruning Phase*/

PruneMatrix(M, ,min_sup(s))

/* Bridge Itemset Calculation Phase*/

for each element mi ∈ M

{

When mi≠ 0

{

A = ItemMap(mi ,0)

B= ItemMap(mi , 1)

}

X=BridgeFind(M,A,B, min-sup(F), min-con(F), min_dpd)

When X ≠ O

{

C = VectorInsert(c(A,B)|X)

Save(X, A, B)

}

}

/*Ranking Phase*/

VectorSort(C)

}

2.2.3 An Efficient Approach to Mining Indirect Associations

For mining indirect associations between items, a new approach based on HI-Struct has been presented [78]. The algorithm is described as follows.

HI-Mine()

{

Construct the HI-struct data structure

Find item pair support and mediator support for each frequent item

Generate all indirect associations between item pairs whose item pair support is less than item pair threshold and mediator support is greater than mediator support threshold

}

Related Work on Classification based on Association Rules

The problem of Associative Classification is to discover a set of valid class association rules in order to build a classifier that performs assigning a class label to the given object.

2.3.1 Integrating Classification and Association Rule Mining

The classification Based on Associations (CBA) algorithm was one of the first AC (Associative Classification) algorithms that employed an Apriori candidate generation step to find the rules [6]. Steps of CBA are as shown below:

If any continuous attribute is present then perform discretization.

Find all class association rules whose support is greater than minimum support.

Find a subset of rules, generated in step 2, whose confidence is more than minimum confidence and then form a classifier.

Drawbacks:

CBA generates many rules for dominant classes

Few or no rules for the minority classes.

To overcome these drawbacks, same authors have presented modified version called CBA (2) which uses multiple support thresholds for each class based on class frequency in the training data set.

It consists of two parts. (1) CBA-RG: generates class association rules using Apriori algorithm and (2) CBA-CB: constructs a classifier based on the rules generated in the previous step.

The CBA-RG Algorithm

Apriori process is used to generate frequent rule items by scanning the database. In the first pass, find all individual frequent items in the dataset. In the next pass, select one rule item found in the previous pass as a seed to generate candidate ruleitems for the current pass. From this set of candidate rule items, it selects frequent ruleitems. The same process is repeated until no more frequent rule items in the database.

Algorithm: CBA-RG ( )

{

F1 = find frequent 1-ruleitems

CAR1 = genRules(F1)

prCAR1 = pruneRules (CAR1)

for (K = 2; FK-1≠Φ; K++)

{

CK = generate candidate rule items from FK-1

for each data case d∈ D

{

Cd = ruleSubset(CK, d)

for each candidate c∈ Cd

{

c.condsupCount++

if (d.class = c.class )

{

c.rulesupCount++

}

}

}

FK = set of all frequent rule items whose support is more than minimum support

CARK = genRules(FK);

prCARK = pruneRules(CARK)

}

CARs = UK CARK

prCARs = UK prCARK

}

Building a Classifier

Select a subset of rules from the given previous phase to construct a classifier in such a way that the selected rules improve the accuracy of the classifiers or should least number of errors are reduced. The proposed algorithm is a heuristic one However, the proposed classifier performs very well as compared to that built by C4.5. A naive version of the algorithm (called M1) for building such a classifier is shown below.

Algorithm:CBA_CB_ M1( )

{

Sort the set of generated rules ( R ) according to their precedence

Select rules for classifier C from R

Discard those rules in C that do not improve the accuracy of the classifier

}

The drawback of the above algorithm is the number of passes over the database. The modified version of the algorithm is presented below.

Algorithm: CBA_CB_M2( )

{

For each case c, find cRule and wRule. If the precedence of cRule is more than the precedence of wRule, then the case should be covered by cRule

Go through c again to find all rules that classify it wrongly and have a higher precedence than the corresponding cRule of c

Choose the final set of rules to form classifier

Choose the set of potential rules to form the classifier

Discard those rules that introduce more errors, and return the final classifier C

}

2.3.2 An Associative Classifier based on Positive and Negative Rules

Generation of class association rules using support confidence frame work with correlation measure has been proposed [4]. Rule generation and classification are the two phases. Rule generation phase generates all valid positive and negative class association rules by the function PONERG (POsitive and NEgative Rule Generation). If the absolute value of the correlation is greater than the correlation threshold value then the classification rule is said to be of interest. If the correlation is positive then a positive association rule or if correlation is negative then negative association rule is discovered. After generating rules, they are checked with confidence. If the confidence of the rule is greater than minimum confidence then the rule is valid. It generates class association rules of the form X c where X is a set of features and c is a class label.

The rules generated are arranged in the order of confidence and support. This sorted set of rules represents the associative classifier ARCPAN( Association Rule Classification with Positive And Negative). Given a new object, the classification process searches in this set of rules for those classes that are relevant to the object presented for classification.

Algorithm: Classification Rule GenerationAll Itemsets (TDB, ms, mc, corr )

{

C = set of class labels

F1= find frequent 1-itemset

for each i ∈ F1

{

for each c∈ C

find r, the correlation between i and c

if correlation is positive then generate a rule ic

if confidence is greater than mc then add to PCAR

else if correlation is negative then generate two rules ┐ ic and i┐c

if confidence (┐ ic) is more than mc then add to NCAR

if confidence (i┐c) is more than mc then add to NCAR

}

for (K = 2; FK−1 ≠ ∅; K ++)

{

CK= generate candidates by joining FK−1 to F1

FK = set of frequent itemsets

for each i ∈ FK

repeat the steps 7-13

}

}

Algorithm: Classification of a new object ()

{

S = ∅ /*set of rules that match o*/

for each r ∈ ARCPAN

if (r ⊂ o) { count++ }

S = S ∪ r

if (count = 1)

fr.conf = r.conf

S = S ∪ r

else if (r.conf > fr.conf-Ï„ )

S = S ∪ r

else break

divide S in subsets by category: S1, S2...Sn

for each subset S1, S2...Sn

sum/substract the confidences of rules and divide by the number of rules in Sk

scorei =

put the new object in the class that has the highest confidence score

o = Ci, with scorei = max{score1..scoren}

}

2.3.3 Multi-class, Multi-label Associative Classification

Multi-class, Multi-label Associative Classification (MMAC) algorithm consists of three stages: rules generation, recursive learning and classification [75].

Stage 1: Scan the training data to find and generate Class Association Rules (CAR).

Stage 2: Find more rules that pass the MinSupp and MinConf thresholds from the remaining unclassified instances, until no further frequent items can be found.

Stage 3: The rules sets derived at each iteration, the same are merged to form a global multi-class label classifier that will then tested against test data

Algorithm: MMAC (TDB, mc, ms)

{

/*Phase 1*/

Scan the training data TDB with n columns to discover frequent items

Produce rules seti by converting any frequent item that passes MinConf into a rule

Rank the rules set according to (confidence,support, …, etc)

Evaluate the rules seti in order to remove redundant rules

/*Phase 2*/

Discard instances Pi associated with rules seti

Generate new training data T1 = T − Pi

Repeat phase 1 on T1 until no further frequent item is found

/*Phase 3*/

Merge rules sets generated at each iteration to produce a multi-label classifier

Classify test objects

}

2.3.4 Multi-class Classification based on Association Rule

Multi-class Classification based on Association Rule (MCAR) is the first AC algorithm that used a vertical mining layout approach [74]. Two main phases are: rule generation and a classifier builder.

In the first phase, the training data set is scanned once to discover the potential rules for size one, and then MCAR intersects the potential rules Tid-lists of size one to find potential rules of size two and so on.

In the second phase, the rules created are used to build a classifier by considering their effectiveness on the training data set. Potential rules that cover a certain number of training objects will be kept in the final classifier.

Algorithm:MCAR(TDB, ms, mc )

{

Scan TDB for the set S of frequent single items

for each pair of disjoint items I1, I2 ∈ S

if <I1 ∪ I2 > passes the MinSupp threshold

S = S ∪ <I1 ∪ I2 >

until no items which pass MinSupp are found

for each item I ∈ S

Generate all rules I  c which pass the MinConf threshold

Rank all rules generated

Remove all rules I1  c1 from S where there is some rule I  c of a higher rank and I  I1

}

Rules Evaluation

A rule is said to be significant if and only if it covers at least one training instance. After the rules have been generated and ranked, an evaluation step tests each rule against the training data set in order to remove rules which fail to classify at least a single instance. At each step, all rows correctly classified by the current rule will be deleted from the training data set. Whenever a rule does not classify any rows of the data, that rule is removed from the rules set due to correct classification of instances by a higher ranked rule. This process ensures that only high confidence rules remain in MCAR classifier.

Classification

In classification, let R is the set of generated rules and D be the test data. The basic idea of the proposed method is to choose a set having high confidence, representative and general rules in R to cover D. In classifying a test object, the first rule in the set of ranked rules that matches its condition then classifies it. This process ensures that only the highest ranked rules classify the test objects.

2.3.5 An Associative Classifier with Negative Rules

A simple and efficient approach has been proposed [21]. For mining class association rules in the form conditionsetc where the conditionset is a set of (attribute, value) pairs and c defines class label and each itemset is considered to be in the form (conditionset, c).

The rule confidence and support can be calculated by using the following formulae:

Confidence (rule) = (2.9)

Support (rule) = (2.10)

Where, conditionSupportCount = number of cases in the dataset that contain conditionset.

ruleSupportCount = number of cases in the dataset D that contain conditionSet and are labeled with class c.

A candidate itemset is said to be legal if it passes Apriori property. For each literal of this legal candidate, ACN replaces this literal with the corresponding negated literal, creates a new negative rule and adds it to the negative ruleSet. The generation of positive rules continues without disturbance and valuable negative rules are produced as by-products of the Apriori process.

Algorithm: ACNRuleGenerator ()

{

Generate frequent-1-Positive and Negative itemsets from the database and denote by L1 and N1 respectively

for(K = 2; LK-1 ≠ Φ; K++)

{

PCK = candidates generated for level K

for each candidate generated in the previous step

{

for each literal on the candidate

{

generate negative rule by replacing the literal by its corresponding negated literal

add this rule to NCK

}

}

LK = set of positive frequent candidates in PCK

NK = set of negative frequent candidates in NCK

}

}

Algorithm: ACN Classifier Builder ()

{

arrange the rules by taking the criteria as rule ranking

for each rule do the following

{

if( number of matching instances in the training data> 0)

{

if( rule= negative && accuracy on remaining data>threshold)

add the rule in classifier and remove those matched instances

}

if( rule = positive)

{

add the rule in classifier and remove those matched instances

}

}

if( database is uncovered)

Select majority class from remaining examples

else

Select majority class from entire training set

}



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now