Related Work On Mining Positive

Published Date: 02 Nov 2017

Chapter 2

The association rule mining task can be stated as: Let I be a set of items, and T be a database of transactions, where each transaction has a unique identifier (tid) and contains a set of items. A set of items also can be called as an itemset. The support of an itemset X, which is denoted by Ïƒ (X), is the number of transactions that X contains. An itemset is frequent if its support is greater than a user-specified minimum support (min sup) value. The association rule is an implication of the form A ïƒ B, where A, B âŠ† I and A âˆ© B = Î¦. The support of the rule is given as Ïƒ (A âˆª B), and the confidence as Ïƒ (A âˆª B)/ Ïƒ (A) (i.e., the conditional probability that a transaction contains B, given that it contains A). The mining task consists of two steps: 1) Find all frequent itemsets. 2) Generate rules which satisfies minimum confidence. This step is relatively straightforward. Therefore, the complexity of an algorithm mainly depends on the complexity of step 1.

2.1.1 Brute-Force Approach

In this approach, first, list all possible association rules. Compute the support and the confidence for each rule. Prune the rules that do not satisfy the minsup and minconf thresholds given by the user. For a given set of items d, the total number of possible association rules can be calculated by using the formula:

Empirically, for instance d=6 the possible association rules are R=602. The relationship between number of items and number of possible rules is as shown below.

Figure 2.1: Relationship between Number of Items (d) and Number of Possible Rules ( R )

From the Figure 2.1, it is evident that the listing of all possible association rules is exponential. Therefore, this approach would be computationally prohibitive.

2.1.2 Finding Frequent Itemsets Using Candidate Generation

Apriori algorithm is proposed for mining frequent itemsets for Boolean association rules [57]. The algorithm uses prior knowledge of frequent itemsets to generate itemsets for the next level. Generally Apriori algorithm consists of two steps, which are, candidate generation and finding frequent itemsets. Initially, it finds frequent 1-itemsets from the database and the result set is denoted as L1. L1 is used for generating the candidate 2-itemsets and then finds L2. L2 is used to generate candidate 3-itemsets and then finds F3. The similar process is repeated until no more frequent itemsets are found in the given dataset.

To improve the efficiency of the level-wise generation of frequent itemsets, we focus on an important property; all nonempty subsets of a frequent itemset must also be frequent, called Apriori Property which is used for reducing the search space.

To find Lk using Lk-1 is two step process consisting of join and prune actions.

The join step: In this step to find CK, a set of candidate k-itemsets are generated by joining LK-1 by itself. The join operation, LK-1 â‹ˆ LK-1, is performed, where members of LK-1 are joinable if their first (K-2) items are in common and K-1 items are different.

The prune step: This step eliminates the extensions of (K-1)-itemsets which are not found to be frequent by using Apriori property, from being considered for counting support.

Algorithm: Apriori (TDB, ms)

{

Find the set of all frequent 1-itemsets from TDB and is denoted by L1

for ( K = 2; LK-1 â‰ Î¦ ; K++)

{ /* Candidate Generation Phase*/

CK = LK-1 â‹ˆ LK-1

/* Pruning using Apriori property*/

for each (k-1)- subsets s of c in CK

{ if s âˆ‰ LK-1 then

CK = CK â€“ {c}

}

/* finding support counts of itemsets in CK */

Scan the database and find support for all c âˆˆ CK

LK = { c / c âˆˆ CK âˆ§ sup(c ) â‰¥ ms }

L = L U LK

}

2.1.3 Generalizing Association Rules to Correlations

The concept of negative relationships has been introduced by a group of researchers [63]. Statistical tests are employed to verify the independence between two variables. Correlation measure was used to find the relationship (Positive or Negative) between the variables range between -1 and 1. Chi-squared based model is proposed for finding the set of minimal correlated itemsets.

Algorithm: Ï‡2-support (Î±, s, p, TDB)

{

for each item iâˆˆ I, find support ( i )

for each pair of items i1, i2 âˆˆ I such that support(i1) > s & support (i2) > s then C = C âˆª {i1, i2}

NOTSIG = Î¦

if (C = = Î¦) return SIG

for each itemset âˆˆ C

construct the contingency table for the item set

if (cell count < p %)

go to Step 7

if ( Ï‡2 table > Ï‡2Î± )

SIG = SIG âˆª itemset

else

NOTSIG = NOTSIG âˆª itemset

Continue with the next item set in C. If there are no more itemsets in C, then set C to be the set of all sets S such that every subset of size |S| â€“ 1 of S is in NOTSIG. go to Step 4.

}

Mining Strong Negative Associations in a Large Database of Customer Transactions

A variant approach to mine negative associations has been proposed in the form of taxonomy by combining frequent itemsets with domain knowledge [65]. As the algorithm is domain dependent and required predefined taxonomy, it is difficult to generalize.

To find the negative itemsets the steps involved are:

(1) Find all the generalized large itemsets in the data (i.e., itemsets at all levels in the taxonomy whose support is greater than the user specified minimum support)

(2) Identify the candidate negative itemsets based on the large itemsets and the taxonomy and assign them expected support.

(3) count the actual support for the candidate itemsets and retain only the negative itemsets .The interest measure RI of negative association rule X ïƒ â”Y, given as RI=(E[support( X U Y )]-support( X U Y ))/support(X) Where E[support(X)] is the expected support of an itemset X.

Algorithm: NegativeCandidateGeneration (MinSup, MinRI)

{

L1 = {large 1-itemsets}

K = 2 /* K represents the pass number */

/* First generate all large itemsets */

while (LK-1 â‰ Î¦)

{

CK = GenCand (LK-1)

for all transactions t âˆˆ TDB

{

Ct = subset(CK, t)

for all candidates c âˆˆ Ct

c.count++

}

LK = {c âˆˆ CK | c.count â‰¥ MinSup}

K = K +1

}

/* Now generate negative itemsets */

Delete all small 1-itemsets from the taxonomy

K = 2

While (LK â‰ Î¦)

{

/* Generate negative candidates of size K */

NCK = GenNegCand (LK)

NC = NC âˆª NCK

K = K +1

}

for all transactions t âˆˆ TDB

{

NCt = subset(NCk, t)

for all candidates c âˆˆ NCt

c.count++

}

NK = {c âˆˆNCK|c.count < MinSup*MinRI}

}

Algorithm: RuleGeneration( nk)

{

for all negative itemsets nK of size K, K â‰¥ 2

{

H1 = {consequents of rules generated from nk with one item in the consequent}

call genrules (nK, H1, L2, LK-2)

}

Procedure genrules (nK, Hm, Lm+1, LK-m-1)

{

if (k > m+1)

{

Hm+1 = apriori-gen (Hm)

for all hm+1 âˆˆ Hm+1

{

if (hm+1 âˆˆ Lm+1)

{

if ((nk - hm+1) âˆˆ Lk-m-1)

{

RI = (Î£[sup(nk)] - sup(nk))/sup(nk - hm+1)

}

if (RI â‰¥ MinRI)

{ output rule (nK - hm+1) â‡ hm+1 }

else

{ delete hm+1 from Hm+1 }

else

{ delete hm+1 from Hm+1 }

}

call genrules (nK, Hm+1)

}

Mining Substitution Rules for Statistically Dependent Items

Substitution Rule Mining (SRM) has been proposed to mine association rules from the given dataset [73]. Mining substitution rules involve two procedures such as finding concrete itemsets and substitution rule generation. To generate concrete itemsets, authors used a statistical measure called Chi-square test to measure the dependency among the items in the itemsets and to find the correlation between itemsets correlation coefficient is used.

Concrete itemsets are those possible itemsets which could be choices for customers with some purchasing purpose. To qualify an itemset as a concrete one, not only the purchasing frequency, i.e., support of an itemset, but also the dependency of items has to be examined to declare that these items are purposefully purchased together by customer.

Let X= {x1,x2,â€¦.xk} be a positive k-itemset, the chi-square value for the itemset X is computed as

Where n defines the number of total transactions, Y+ defines the positive itemset where all complement items in the itemset Y are replaced by their positive counterparts, such as {a, â”b, â”c}+= {a, b, c} where a, b, and c are positive items, and EI= is the expected support of I.

Algorithm: Apriori-Dual ( MinSup, MinConf )

{

Append the complement items whose positive counterpart is not originally present to each transaction

Generate the set of frequent (positive and negative) items, i.e., L1

Remove the negative items whose positive counterpart is not frequent from L1

for ( ; K â‰¥ 2 ; )

{

Generate the candidate set of K-itemsets from LK-1 i.e., CK = LK-1 â‹ˆ LK-1

if (CK = Î¦) then break

Scan the transactions to calculate supports of all candidate K-itemsets

LK = { c âˆˆ CK | Sc â‰¥ MinSup}

}

/* procedure of negative association rule generation*/

for each negative itemset X in LK

{

Let â”Y be the largest pure negative itemset that â”YâŠ‚ X

if (X- â”Y) is not an empty set // (X- â”Y) is positive

if (Conf((X- â”Y)ïƒ â”Y)â‰¥ MinConf)

Output the rule (X- â”Y)ïƒ â”Y

}

Algorithm: SRM ( MinSup, MinConf, Ïmin )

{

L1 = The set of all positive frequent itemsets

set of concrete itemsets = L1

for ( ; Kâ‰¥2 ; )

{

Generate the candidate set of k-itemsets from LK-1 i.e., CK = LK-1 â‹ˆ LK-1

if (CK = Î¦) then break

Scan the transactions to calculate supports of all candidate K-itemsets

LK = { c âˆˆ CK| Sc â‰¥ MinSup}

for each frequent itemset X in LK

{

if (SX > Î xi âˆˆX Sxi ) &&( Chi(X) â‰¥ Ï‡2df (X), Î± )

Add X to the set of concrete itemsets

}

/* procedure of substitution rule generation*/

for each pair of concrete itemsets X, Y

{

if (Ï (X, Y) < -Ïmin )

if (sup( X ïƒ â”Y) â‰¥ MinSup) &&conf(( X ïƒ â”Y) â‰¥ MinConf)

/* X ïƒ â”Y is valid */

Output the substitution rule X âŠ³ Y

}

Mining Negative Association Rules

A variant method for mining negative association rules has been proposed [82]. It consists of three steps:

(1) Find a set of positive rules.

(2)Generate negative rules based on existing positive rules and domain knowledge.

(3) Prune the redundant rules.

The support and confidence of LHS negative rule and RHS negative rule can be computed using the following formulae:

= (1- conf(Yïƒ X)) (2.6)

Algorithm: MNAR ( minsup, minconf, TDB)

{

Find all valid positive frequent itemsets (F) and Positive Association Rules (PAR) using Apriori Algorithm

/* Generate Negative Rules*/

Delete all infrequent items from the taxonomy

for all rules r âˆˆ PAR

{

TRSs = GNC(r)

for all rules tr âˆˆ TRSs

{

if (SM(tr.conf, t.conf) > confDeviate)

{

if( supp(Neg(tr)) > minsup & conf(Neg(tr)) > minconf))

NAR = NAR âˆª Neg(tr)

}

/* Pruning*/

if all members of LOS have common itemset that form {r1, r2,â€¦.. rn} âŠ† Rule

{

delete rk, where rk falls in the categories

}

Mining Positive and Negative Association Rules

The most common frame-work in the association rule generation is the "Support-Confidence" one. A new frame-work called support-confidence-correlation has been proposed [40]. Mining frequent itemsets and generating strong association rules phases are combined. It has generated the appropriate rules while finding the correlations within candidate itemset and thus redundant item combinations eliminated. Indeed, for each generated candidate itemset, all possible combinations of items are computed to analyze their correlations. At the end, only those rules generated are kept from item combinations with strong correlation. If the correlation is positive, a positive rule is discovered. If the correlation is negative, two negative rules are discovered. The negative rules produced are of the form X ïƒ â”Y or â”X ïƒ Y which the authors term as "confined negative association rules". Here the entire antecedent or consequent is either a conjunction of negated attributes or a conjunction of non-negated attributes.

Algorithm: PNARG (ms, mc, Ïmin , TDB)

{

if Ïmin is undefined then Ïmin = 0.5

Find F1 the set of frequent 1-itemsets

for (K = 2; LK-1 â‰ Î¦; K + +)

{

CK = LK-1 â‹ˆ L1

for each i âˆˆ CK

{

s = support(TDB, i)

if (s â‰¥ ms) then LK = LK âˆª {i}

for each A,B where A âˆª B = i

{

Ï = correlation between A and B

if (Ï â‰¥ Ïmin )

{

if (confidence(A ïƒ B ) â‰¥ mc ) then PAR= PAR âˆª {Aïƒ B}

else if (supp(â”A ïƒ â”B )â‰¥ ms &&confidence(â”A ïƒ â”B ) â‰¥ mc) then NAR = NAR âˆª{â”A ïƒ â”B}

}

if (Ï â‰¤ Ïmin) /* Ï < 0 and | Ï |â‰¥ Ï min */

{

if (confidence(X ïƒ â”Y )â‰¥ mc) then NAR = NAR âˆª{Aïƒ â”B}

if (confidence(â”A ïƒ B ) â‰¥ mc ) then NAR = NAR âˆª{â”Aïƒ B}

}

AR = PAR âˆª NAR

if (AR = Î¦)

{

Ïmin = Ï min- 0.1

if ( Ïmin â‰¥ 0)

call PNARG( ms, mc, Ïmin , TDB)

}

return AR

}

Efficient Mining of Both Positive and Negative Association Rules

A novel approach has been proposed to mine positive and negative association rules [79]. In this, generating valid positive and negative association rules approach can be divided into the following:

Generate the set of frequent itemsets of interest (FI) and the set of infrequent itemsets of interest (IFI)

Extract positive rules of the form Aïƒ B in FI, and negative rules of the forms Aïƒ â” B, â” Aïƒ B and â”Aïƒ â” B in IFI.

To generate FI, IFI and negative association rules they developed three functions namely, fipi( ), iipis( ) and CPIR( ).

Algorithm: AllItemsetsofInterest(TDB, ms, mc, mi)

{

FI = Î¦

IFI = Î¦

L1 = set of frequent 1-itemsets

FI = FI âˆª L1

for (K = 2; LKâˆ’1 â‰ Î¦; K ++)

{

CK = LK-1 â‹ˆ LK-1

LK = set of all frequent itemsets

NK = CK â€“ LK

for each itemset i âˆˆ LK

{

if i is not frequent itemset of potential interest then LK = LK âˆ’ {i }

FI = FI âˆª LK

}

for each itemset j âˆˆ NK

{

if j is not infrequent itemset of potential interest then NK = NK âˆ’ {j}

IFI = IFI âˆª NK

}

return FI and IFI

}

The algorithm for extracting both positive and negative association rules with the probability ratio model for confidence checking is designed as follows:

Algorithm: PositiveAndNegativeAssociations ( TDB, ms, mc, mi)

{

for each frequent itemset i âˆˆ FI

{

for each A,B where A âˆª B = i

{

if( fipis(A, B ) )

if (CPIR(B |A ) â‰¥ mc ) then PAR = PAR âˆª { A ïƒ B }with confidence = CPIR(B |A ) and support= supp(i)

if (CPIR(A |B ) â‰¥ mc ) then PAR = PAR âˆª { B ïƒ A}with confidence = CPIR(A |B ) & support= supp(i)

}

/* Generate all negative association rules in IFI*/

for each itemset j âˆˆ IFI

{

for each A,B where A âˆª B = j

{

if( iipis(A, B ) )

{

if (CPIR(B |â”A ) â‰¥ mc) then NAR = NAR âˆª { â”A ïƒ B } with confidence = CPIR(B |â”A ) & support =supp(â”A |B )

if (CPIR(â”A |B )| â‰¥ mc) then NAR = NAR âˆª { B ïƒ â”A } with confidence = CPIR(â”A |B ) & support = supp(B âˆªâ”A )

if (CPIR(â”B |A )| â‰¥ mc) then NAR = NAR âˆª { A ïƒ â”B } with confidence = CPIR(â”B |A ) & support = supp(A |â”B )

if (CPIR(A |â”B )| â‰¥ mc) then NAR = NAR âˆª { â”Bïƒ A} with confidence = CPIR(A |â”B ) & support = supp(â”B âˆª A )

if (CPIR(â”B |â”A )| â‰¥ mc ) then NAR = NAR âˆª { â”A ïƒ â”B } with confidence = CPIR(â”B |â”A ) & support = supp(â”A |â”B )

if (CPIR(â”A |â”B )| â‰¥ mc ) then NAR = NAR âˆª { â”B ïƒ â”A }with confidence = CPIR(â”A |â”B ) & support = supp(â”B âˆª â”A )

}

Conditional-Probability Increment Ratio (CPIR) function for a pair of itemsets X and Y, denoted by CPIR as follows:

Mining Positive and Negative Association Rules from Large Databases

An innovative approach has proposed [11]. In this generating positive and negative association rules consists of four steps: (i) Generate all positive frequent itemsets L (P1) (ii) for all itemsets I in L (P1), generate negative frequent itemsets of the form â” ( I1 I2 ) (iii) Generate all negative frequent itemsets â” I1 â”I2 (iv) Generate all negative frequent itemsets I1 â” I2 and (v) Generate all valid positive and negative association rules. Negative Rules have been generated without adding additional interesting measure(s) to support-confidence frame work.

Procedure PNAR ( )

{

Find positive frequent itemsets

Find negative frequent itemsets of the form â”( XY)

Find negative frequent itemsets of the form â”X â” Y

Find negative frequent itemsets of the form â”X Y

Find positive and negative association rules for the items generated in steps 1-4

}

An Effective Algorithm for Mining Positive and Negative Association Rules

Mining positive and negative association rules approach consists of three sub problems [23]:

(1) Finding the frequent itemsets (PL) and infrequent itemsets (NL).

(2) From PL, find valid positive rules of the form Aïƒ B.

(3) From NL, find valid negative rules of the forms Aïƒ â”B and â”Aïƒ B.

Let TDB be a transactional database, and ms, mc, dms and dmc given by the user. The algorithm for extracting both positive and negative association rules with a correlation coefficient measure

CorrA,B= (2.8)

is designed as follows:

Algorithm: PNAR (TDB, ms, mc, dms, dmc )

{

positiveAR = Î¦ ; negativeAR = Î¦ /*positive and negative AR itemsets*/

PL= Î¦ ; NL= Î¦

F1- the set of frequent 1-itemset

PL= PLâˆªL1

for (K = 2; LK-1 â‰ Î¦ ; K + +)

{

perform join operation on LK-1 to itself and let it be CK

for each i âˆˆ CK

{ S = supp( i )

if (S â‰¥ ms)

{ LK = LK âˆª {i}

PL = PL âˆª LK

}

else

{

NLK = NLK âˆª {i}

NL = NL âˆª NLK

}

for each frequent itemset i âˆˆ PL

{

for each expression A âˆª B = i and A âˆ© B = Ï†

{

corrA,B = supp(AâˆªB) / ( supp(A) * supp(B))

if (corrA,B >1)

if (conf(A ïƒ B) â‰¥ mc )

positiveAR = positiveAR âˆª { A ïƒ B }

}

for each infrequent itemset i âˆˆ NL

{

for each expression A âˆª B = i and A âˆ© B = Ï†

{

corrA,B = supp(AâˆªB) / ( supp(A) * supp(B))

if(corrA,B < 1)

if (supp(Aïƒ â”B ) â‰¥ dms && conf(Aïƒ â”B) â‰¥ dmc )

{ negativeAR = negativeAR âˆª{ Aïƒ â”B } }

if(supp(â”Aïƒ B ) â‰¥ dms && conf(â”Aïƒ B) â‰¥ dmc)

{ negativeAR = negativeAR âˆª{ â”Aïƒ B } }

}

AR = positiveAR âˆª negativeAR

ruturn AR

}

Related Work on Indirect Association Rule Mining

Indirect association and negative association are similar as both dealt with infrequent itemsets. A negative association rule discovers the set of items a customer will not likely to buy a certain set of other items. The most significant difference between negative associations and indirect associations is that a mediator is central to the concept of indirect associations.

2.2.1 Mining Higher Order Dependencies in Data

The first algorithm for mining indirect associations between pairs of items has been presented below [72].

Algorithm: INDIRECT (TDB, ts,td)

{

Find L, the set of all frequent itemsets using Apriori algorithm

P = Î¦

for ( K = 2 ; Kâ‰¤ n; K++)

{

CK+1 = join LK to itself

Consider any two items x, y âˆˆ CK+1 such that i âŠ‚ x, j âŠ‚ y and x âˆ© y = M

for each < i, j , M > âˆˆ CK+1

{

if (sup({i , j}) < ts and dep({i}, M)â‰¥ td and dep({j}, M) â‰¥ td)

{

P = P âˆª {< i , j , M> }

}

There are two major phases in this algorithm:

1. Find frequent itemsets using Apriori algorithm

2. Discover all indirect associations by performing Candidate generation and pruning

INDIRECT algorithm requires two join operations. The first one is in the Apriori algorithm in order to generate all frequent itemsets and the other one is in the generation of candidates for mining indirect associations. In general, join operation is quite expensive. The join operation in INDIRECT is more expensive than the join operation in the Apriori.

2.2.2 Indirect Association Mining

The proposed Indirect Association Mining (IAM) algorithm proceeds in four phases: an initialization phase, a pruning phase, a bridge itemset calculation phase, and a ranking phase. The purpose of the initialization phase is to allocate the memory needed. Initially, an itempair support value matrix M is constructed. Then it is updated by scanning the entire database by calling the function MatrixUpdateO. The second phase is a process of pruning for the purpose of minimizing the search space of problem. The threshold value of pruning is MinSup(s). The third phase, the Bridge Itemset Calculation Phase, is the most important for this algorithm. In this phase, A and B, which are may be indirectly associated, are firstly got by calling the function ItemMap().Then the bridge itemset can be calculated by calling the function BridgeFindO. And the bridge itemset X and two indirectly associated items A and B are saved and the closeness is inserted into a linked vector C for non empty itemset X. The last Phase, a ranking phase, is mainly to finish the ranking operation according to the closeness value in the linked vector C for the purpose of providing decision makers the most useful indirect association rules [35].

Algorithm: IndirectAssociationMining ( T, X, min-sup(s), min-sup(F) , min-con(F), min-dpd, M, mi, C)

{

/* Initialization Phase*/

M = malloc{[mi=O])

for each transaction tâˆˆ T

{ MatrixUpdate(M,ti) }

/*Pruning Phase*/

PruneMatrix(M, ,min_sup(s))

/* Bridge Itemset Calculation Phase*/

for each element mi âˆˆ M

{

When miâ‰ 0

{

A = ItemMap(mi ,0)

B= ItemMap(mi , 1)

}

X=BridgeFind(M,A,B, min-sup(F), min-con(F), min_dpd)

When X â‰ O

{

C = VectorInsert(c(A,B)|X)

Save(X, A, B)

}

/*Ranking Phase*/

VectorSort(C)

}

2.2.3 An Efficient Approach to Mining Indirect Associations

For mining indirect associations between items, a new approach based on HI-Struct has been presented [78]. The algorithm is described as follows.

HI-Mine()

{

Construct the HI-struct data structure

Find item pair support and mediator support for each frequent item

Generate all indirect associations between item pairs whose item pair support is less than item pair threshold and mediator support is greater than mediator support threshold

}

Related Work on Classification based on Association Rules

The problem of Associative Classification is to discover a set of valid class association rules in order to build a classifier that performs assigning a class label to the given object.

2.3.1 Integrating Classification and Association Rule Mining

The classification Based on Associations (CBA) algorithm was one of the first AC (Associative Classification) algorithms that employed an Apriori candidate generation step to find the rules [6]. Steps of CBA are as shown below:

If any continuous attribute is present then perform discretization.

Find all class association rules whose support is greater than minimum support.

Find a subset of rules, generated in step 2, whose confidence is more than minimum confidence and then form a classifier.

Drawbacks:

CBA generates many rules for dominant classes

Few or no rules for the minority classes.

To overcome these drawbacks, same authors have presented modified version called CBA (2) which uses multiple support thresholds for each class based on class frequency in the training data set.

It consists of two parts. (1) CBA-RG: generates class association rules using Apriori algorithm and (2) CBA-CB: constructs a classifier based on the rules generated in the previous step.

The CBA-RG Algorithm

Apriori process is used to generate frequent rule items by scanning the database. In the first pass, find all individual frequent items in the dataset. In the next pass, select one rule item found in the previous pass as a seed to generate candidate ruleitems for the current pass. From this set of candidate rule items, it selects frequent ruleitems. The same process is repeated until no more frequent rule items in the database.

Algorithm: CBA-RG ( )

{

F1 = find frequent 1-ruleitems

CAR1 = genRules(F1)

prCAR1 = pruneRules (CAR1)

for (K = 2; FK-1â‰ Î¦; K++)

{

CK = generate candidate rule items from FK-1

for each data case dâˆˆ D

{

Cd = ruleSubset(CK, d)

for each candidate câˆˆ Cd

{

c.condsupCount++

if (d.class = c.class )

{

c.rulesupCount++

}

FK = set of all frequent rule items whose support is more than minimum support

CARK = genRules(FK);

prCARK = pruneRules(CARK)

}

CARs = UK CARK

prCARs = UK prCARK

}

Building a Classifier

Select a subset of rules from the given previous phase to construct a classifier in such a way that the selected rules improve the accuracy of the classifiers or should least number of errors are reduced. The proposed algorithm is a heuristic one However, the proposed classifier performs very well as compared to that built by C4.5. A naive version of the algorithm (called M1) for building such a classifier is shown below.

Algorithm:CBA_CB_ M1( )

{

Sort the set of generated rules ( R ) according to their precedence

Select rules for classifier C from R

Discard those rules in C that do not improve the accuracy of the classifier

}

The drawback of the above algorithm is the number of passes over the database. The modified version of the algorithm is presented below.

Algorithm: CBA_CB_M2( )

{

For each case c, find cRule and wRule. If the precedence of cRule is more than the precedence of wRule, then the case should be covered by cRule

Go through c again to find all rules that classify it wrongly and have a higher precedence than the corresponding cRule of c

Choose the final set of rules to form classifier

Choose the set of potential rules to form the classifier

Discard those rules that introduce more errors, and return the final classifier C

}

2.3.2 An Associative Classifier based on Positive and Negative Rules

Generation of class association rules using support confidence frame work with correlation measure has been proposed [4]. Rule generation and classification are the two phases. Rule generation phase generates all valid positive and negative class association rules by the function PONERG (POsitive and NEgative Rule Generation). If the absolute value of the correlation is greater than the correlation threshold value then the classification rule is said to be of interest. If the correlation is positive then a positive association rule or if correlation is negative then negative association rule is discovered. After generating rules, they are checked with confidence. If the confidence of the rule is greater than minimum confidence then the rule is valid. It generates class association rules of the form Xïƒ c where X is a set of features and c is a class label.

The rules generated are arranged in the order of confidence and support. This sorted set of rules represents the associative classifier ARCPAN( Association Rule Classification with Positive And Negative). Given a new object, the classification process searches in this set of rules for those classes that are relevant to the object presented for classification.

Algorithm: Classification Rule GenerationAll Itemsets (TDB, ms, mc, corr )

{

C = set of class labels

F1= find frequent 1-itemset

for each i âˆˆ F1

{

for each câˆˆ C

find r, the correlation between i and c

if correlation is positive then generate a rule iïƒ c

if confidence is greater than mc then add to PCAR

else if correlation is negative then generate two rules â” iïƒ c and iïƒ â”c

if confidence (â” iïƒ c) is more than mc then add to NCAR

if confidence (iïƒ â”c) is more than mc then add to NCAR

}

for (K = 2; FKâˆ’1 â‰ âˆ…; K ++)

{

CK= generate candidates by joining FKâˆ’1 to F1

FK = set of frequent itemsets

for each i âˆˆ FK

repeat the steps 7-13

}

Algorithm: Classification of a new object ()

{

S = âˆ… /*set of rules that match o*/

for each r âˆˆ ARCPAN

if (r âŠ‚ o) { count++ }

S = S âˆª r

if (count = 1)

fr.conf = r.conf

S = S âˆª r

else if (r.conf > fr.conf-Ï„ )

S = S âˆª r

else break

divide S in subsets by category: S1, S2...Sn

for each subset S1, S2...Sn

sum/substract the confidences of rules and divide by the number of rules in Sk

scorei =

put the new object in the class that has the highest confidence score

o = Ci, with scorei = max{score1..scoren}

}

2.3.3 Multi-class, Multi-label Associative Classification

Multi-class, Multi-label Associative Classification (MMAC) algorithm consists of three stages: rules generation, recursive learning and classification [75].

Stage 1: Scan the training data to find and generate Class Association Rules (CAR).

Stage 2: Find more rules that pass the MinSupp and MinConf thresholds from the remaining unclassified instances, until no further frequent items can be found.

Stage 3: The rules sets derived at each iteration, the same are merged to form a global multi-class label classifier that will then tested against test data

Algorithm: MMAC (TDB, mc, ms)

{

/*Phase 1*/

Scan the training data TDB with n columns to discover frequent items

Produce rules seti by converting any frequent item that passes MinConf into a rule

Rank the rules set according to (confidence,support, â€¦, etc)

Evaluate the rules seti in order to remove redundant rules

/*Phase 2*/

Discard instances Pi associated with rules seti

Generate new training data T1 = T âˆ’ Pi

Repeat phase 1 on T1 until no further frequent item is found

/*Phase 3*/

Merge rules sets generated at each iteration to produce a multi-label classifier

Classify test objects

}

2.3.4 Multi-class Classification based on Association Rule

Multi-class Classification based on Association Rule (MCAR) is the first AC algorithm that used a vertical mining layout approach [74]. Two main phases are: rule generation and a classifier builder.

In the first phase, the training data set is scanned once to discover the potential rules for size one, and then MCAR intersects the potential rules Tid-lists of size one to find potential rules of size two and so on.

In the second phase, the rules created are used to build a classifier by considering their effectiveness on the training data set. Potential rules that cover a certain number of training objects will be kept in the final classifier.

Algorithm:MCAR(TDB, ms, mc )

{

Scan TDB for the set S of frequent single items

for each pair of disjoint items I1, I2 âˆˆ S

if <I1 âˆª I2 > passes the MinSupp threshold

S = S âˆª <I1 âˆª I2 >

until no items which pass MinSupp are found

for each item I âˆˆ S

Generate all rules I ïƒ c which pass the MinConf threshold

Rank all rules generated

Remove all rules I1 ïƒ c1 from S where there is some rule I ïƒ c of a higher rank and I ïƒ I1

}

Rules Evaluation

A rule is said to be significant if and only if it covers at least one training instance. After the rules have been generated and ranked, an evaluation step tests each rule against the training data set in order to remove rules which fail to classify at least a single instance. At each step, all rows correctly classified by the current rule will be deleted from the training data set. Whenever a rule does not classify any rows of the data, that rule is removed from the rules set due to correct classification of instances by a higher ranked rule. This process ensures that only high confidence rules remain in MCAR classifier.

Classification

In classification, let R is the set of generated rules and D be the test data. The basic idea of the proposed method is to choose a set having high confidence, representative and general rules in R to cover D. In classifying a test object, the first rule in the set of ranked rules that matches its condition then classifies it. This process ensures that only the highest ranked rules classify the test objects.

2.3.5 An Associative Classifier with Negative Rules

A simple and efficient approach has been proposed [21]. For mining class association rules in the form conditionsetïƒ c where the conditionset is a set of (attribute, value) pairs and c defines class label and each itemset is considered to be in the form (conditionset, c).

The rule confidence and support can be calculated by using the following formulae:

Confidence (rule) = (2.9)

Support (rule) = (2.10)

Where, conditionSupportCount = number of cases in the dataset that contain conditionset.

ruleSupportCount = number of cases in the dataset D that contain conditionSet and are labeled with class c.

A candidate itemset is said to be legal if it passes Apriori property. For each literal of this legal candidate, ACN replaces this literal with the corresponding negated literal, creates a new negative rule and adds it to the negative ruleSet. The generation of positive rules continues without disturbance and valuable negative rules are produced as by-products of the Apriori process.

Algorithm: ACNRuleGenerator ()

{

Generate frequent-1-Positive and Negative itemsets from the database and denote by L1 and N1 respectively

for(K = 2; LK-1 â‰ Î¦; K++)

{

PCK = candidates generated for level K

for each candidate generated in the previous step

{

for each literal on the candidate

{

generate negative rule by replacing the literal by its corresponding negated literal

add this rule to NCK

}

LK = set of positive frequent candidates in PCK

NK = set of negative frequent candidates in NCK

}

Algorithm: ACN Classifier Builder ()

{

arrange the rules by taking the criteria as rule ranking

for each rule do the following

{

if( number of matching instances in the training data> 0)

{

if( rule= negative && accuracy on remaining data>threshold)

add the rule in classifier and remove those matched instances

}

if( rule = positive)

{

add the rule in classifier and remove those matched instances

}

if( database is uncovered)

Select majority class from remaining examples

else

Select majority class from entire training set

}

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Related Work On Mining Positive

Chapter 2

2.1.1 Brute-Force Approach

2.1.2 Finding Frequent Itemsets Using Candidate Generation

Algorithm: Apriori (TDB, ms)

{

}

}

}

2.1.3 Generalizing Association Rules to Correlations

Algorithm: Ï‡2-support (Î±, s, p, TDB)

{

}

Mining Strong Negative Associations in a Large Database of Customer Transactions

Algorithm: NegativeCandidateGeneration (MinSup, MinRI)

{

{

{

}

}

{

}

{

}

}

Algorithm: RuleGeneration( nk)

{

{

}

}

Procedure genrules (nK, Hm, Lm+1, LK-m-1)

{

{

{

{

{

}

}

}

}

}

Mining Substitution Rules for Statistically Dependent Items

Algorithm: Apriori-Dual ( MinSup, MinConf )

{

{

}

{

}

}

Algorithm: SRM ( MinSup, MinConf, Ïmin )

{

{

{

}

}

{

}

}

Mining Negative Association Rules

Algorithm: MNAR ( minsup, minconf, TDB)

{

{

{

{

}

{

}

}

}

Mining Positive and Negative Association Rules

Algorithm: PNARG (ms, mc, Ïmin , TDB)

{

{

{

{

{

}

{

}

}

Algorithm: SRM ( MinSup, MinConf, Ïmin )

Algorithm: PNARG (ms, mc, Ïmin , TDB)