Mining Non Redundant Rules

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Chapter 5

Introduction

Han et al [48] defined "Association Rules generated from mining data at multiple levels of abstraction are called multiple level or Multi level Association Rules. Multilevel association rules can be mined efficiently using concept hierarchies under support-confidence framework".

An undiscovered knowledge can be discovered with multi/cross level, which could not be found by the single level approach and this new knowledge may be highly relevant or interesting to a given user [5, 21]. In fact, the multi-level rule mining is useful in discovering new knowledge, which is missed by conventional algorithms [36]. If a database/dataset has a hierarchical structure, then a multi-level or cross-level rule mining will discover more interesting knowledge. Multi-level rules span multiple levels of abstraction, but the items within one rule come from the same concept level. They can be at different levels and contain more general or more specific information than the single level rules. The intermediate results from high levels of abstraction can be used to mine lower abstract levels and refine the process [18]. The Multi/Cross level Association Rules can be mined under support confidence framework considering uniform-support, reduced-support and group-based support. Cross-level rules are those where the items within a single rule come from different multiple levels of abstraction. Here the two items come from different levels in the hierarchy. Han & Fu [18] proposed that the relaxation on rule conditions among concepts at the same abstract level would allow exploration of what they termed ‘level-crossing’ relationships, which would contain items at different abstract levels [18][16].

A multi-level dataset is one, which has a taxonomy or concept tree, like the example shown in Figure 4.1. The items in the dataset exist at the lowest concept level but are part of a hierarchical structure and organization. Thus, in this example, ‘Battlestar Galactica’ is an item at the lowest level of the taxonomy. It is also belongs to the high-level concept category of ‘Science Fiction’, and the more refined category ‘Futuristic’. Each entry in the hierarchy has one parent with a path in the hierarchy. The hierarchy information encoded with each item allowing information about a given topic’s ancestry. Starting from root to all its children, the root will have a 0 coding. For example, the items can be encoded as i-j-k. The first digit ‘i’ indicates that it belongs to the ith category in the first concept level (the one immediately under the root). The second digit ‘j’ indicates that it belongs to the jth category in the second concept level from the previous level. The third digit ‘k’ indicates that it belongs to the kth category in the third concept level from the previous level, and so on. For example, the item ‘Battlestar Galactica’ encoded as 1-1-2, and the node ‘Star Trek’ encoded as 1-1-2. The encoding of nodes is in from left to right fashion.

In the present work, briefly introduced the related work on Multi level association rules, followed by basic concepts, definitions, and algorithms of Exact and Reliable basis are given. Finally, removes hierarchically redundant rules from multilevel datasets. Furthermore, the work shows the concise representation of non-redundant multi-level association rules is lossless. Since, Association rules are derived from the condensed representation.

Fig 4.1 Multi Level Data set

Related Work on Multi-Level Association Rules

Han et al [14, 15 and 19] has presented the earliest approaches to find frequent itemsets in multi-level datasets. The Han et al proposed an algorithm called ML_T2L1 [15, 19]. This is a progressive deepening method. The ML_T2L1 [15, 19] algorithm used to generate the multiple level association rules. The dataset hierarchy is encoded and used in the transaction table to generate the multi-level rules. This algorithm processes each level of the hierarchy. It filters any level item, which does not meet the minimum support threshold specified by the user, thus at the same time the pruning is performed. If a high-level topic is not frequent, then the subtopics of that high-level topic will never be frequent and there is no need to process them. Han et al[15] focused on finding frequent itemsets at each of the levels (eg. frequent itemsets from level 1, level 2 and so on) in the dataset.

Thakur et al [44] proposed an approach, which is a top-down progressive deepening method. This approach determines the frequent itemsets at each concept level, starting at the highest level (closest to the root of the hierarchy). Any items deemed to be not frequent are filtered out along with their sub items. This means at lower levels those ancestors which were frequent will be considered, to reduce the computation costs.

Adaptive FP-growth was proposed by R. Mao [32] to perform multi-level association rule mining based on a FP Growth approach [5, 10]. Using the FP-Growth approach, it first finds the frequent 1-itemsets and the concept level that it belongs to. The Adaptive-FP approach uses a different support threshold for each concept level, than the same support for each concept level.. Adaptive-FP treats all frequent itemsets regardless of concept level. When, it comes to the rule generation then the FP-Tree is revisited for association rules [32].

Another FP-Growth based approach to multi-level rule mining is FP’-Tree, proposed by Ong, Ng & Lim [36]. Their approach differs from other traditional rule mining algorithms. As they take into account the recurrence relationship/s within a transaction. The recurrence is simply the quantity of an item and can be used easily when there is a transactional dataset. Unlike Adaptive-FP, this approach builds a separate FP-Tree for each concept level that is being mined. Although this approach appears to be promising, but, there is no consideration given to the discovery of cross-level association rules that are contained within the dataset. The focus is solely on obtaining multi-level rules through the use of recurrence & quantity.

Hong, Lin & Wang [24, 25] developed a fuzzy Apriori-based approach for mining multi-level association rules using similar techniques to FDM. It too works on quantitative datasets and also relies on knowing membership function/s in advanced. These approaches may not generate the complete set of rules and thus information may be lost. But, it generates the most important rules because they include the most important fuzzy terms for the items [24, 25].

Kaya & Alhajj [24] proposed an approach that is based on the work of Han & Fu [16, 28] and Hong et al [24, 25], in that they used fuzzy set theory, weighted mining, linguistic terms. They used support and confidence to measure the strength of rule interestingness, and also item importance. By using these approaches it is claimed that the rules are more meaningful and more understandable to users. The performance of this work produces consistent and meaningful results. It has been tested on a synthetic dataset only and there is no test on a ‘real-world’ or actual dataset [24].

Multi Level Association Rules

Frequent Multi-level Itemsets: An itemset A is frequent in the set of transactions S, at level l if the support of A is not less than its corresponding minimum support threshold minsuppl [Han et al 15, 19]. The confidence of a rule A B is high at level l if its confidence is not less than its corresponding minimum confidence threshold minconfl. Furthermore, if a rule A B is strong for a set S, then ancestor of every item in A and B is frequent at its corresponding level. At current level if A  B is frequent, then the confidence of A B is high (at the current level) [15, 19 and 44].

The following is a multi level sample dataset [15, 19] used to discover the Multi/cross level association rules.

Transaction ID

Itemset

1

{1-1-1, 1-2-1, 2-1-1, 2-2-1}

2

{1-1-1, 2-1-1, 2-2-2, 3-2-3}

3

{1-1-2, 1-2-2, 2-2-1, 4-1-1}

4

{1-1-1, 1-2-1}

5

{1-1-1, 1-2-2, 2-1-1, 2-2-1, 4-1-3}

6

{2-1-1, 3-2-3, 5-2-4}

7

{3-2-3, 4-1-1, 5-2-4, 7-1-3}

Table 4.1. Multilevel Transaction Dataset

This simple multi-level dataset has three concept levels with each item belonging to the lowest level. The item ID in the table holds the hierarchy information for each item. Thus, the item 1-2-1 belongs to the category at level 1 and for level 2 it belongs to the second sub-category of the level 1. Finally, at level 3 it belongs to the first sub-category of its parent category at level 2. The frequent itemsets discovered are shown in the following tables. For the multi-level frequent itemset the minimum support is 4 for the first level, 3 for the second level and 3 for the third level.

1-Itemset

2-Itemset

3-Itemset

[1-*-*]

[1-*-*,2-*-*]

[1-1-*, 1-2-*, 2-2-*]

[2-*-*]

[1-1-*, 1-2-*]

[1-1-*, 2-1-*, 2-2-*]

[1-1-*]

[1-1-*, 2-1-*]

[1-2-*]

[1-1-*, 2-2-*]

[2-1-*]

[1-2-*, 2-2-*]

[2-2-*]

[2-1-*, 2-2-*]

[1-1-1]

[1-1-1, 2-1-1]

[2-1-1]

[2-2-1]

Table 4.2. Frequent itemsets

The above Table 4.2 contains all the frequent itemsets which come from the same level or one concept level. The cross level itemsets are,

1-itemsets

2-itemsets

3-itemsets

[1-*-*]

[1-*-*, 2-*-*]

[1-*-*, 2-1-*, 2-2-*]

[2-*-*]

[1-*-*, 2-1-*]

[2-*-*, 1-1-*, 1-2-*]

[1-1-*]

[1-*-*, 2-2-*]

[1-1-*, 1-2-*, 2-2-*]

[1-2-*]

[2-*-*, 1-1-*]

[1-1-*, 2-1-*, 2-2-*]

[2-1-*]

[2-*-*, 1-2-*]

[1-*-*, 2-1-*, 2-2-*]

[2-2-*]

[1-1-*, 1-2-*]

[1-1-*, 2-1-1, 2-2-*]

[1-1-1]

[1-1-*, 2-1-*]

[1-1-*, 2-2-1, 1-2-*]

[2-1-1]

[1-1-*, 2-2-*]

[2-1-*, 1-1-1, 2-2-*]

[2-2-1]

[1-2-*, 2-2-*]

[2-2-*, 1-1-1, 2-1-1]

[2-1-*, 2-2-*]

[1-*-*, 2-1-1]

[1-*-*, 2-2-1]

[2-*-*, 1-1-1]

[1-1-*, 2-1-1]

[1-1-*, 2-2-1]

[1-2-*, 1-1-1]

[1-2-*, 2-2-1]

[2-1-*, 1-1-1]

[2-2-*, 1-1-1]

[2-2-*, 2-1-1]

[1-1-1, 2-1-1]

Table 4.3. Frequent itemsets with Cross-level

The frequent itemsets shown in Table 4.3 include multi/cross-level itemsets (where the items within a single itemset come from different concept levels). After generating the frequent itemsets, the proposed work generates the frequent closed itemsets.

Mining Closed Frequent Itemsets in Multi-Level Datasets

Currently there are no unique approaches for generating frequent closed itemsets and their generators for multi-level frequent itemsets. However, the frequent closed itemsets for single level rule techniques can be considered for generating Frequent Closed Itemsets. Thus, for generating the closed itemsets and generators for multi level dataset, in this proposed work existing approaches for single level dataset such as CLOSE+ [38] are used.

The CLOSE+ [38] algorithm generates the frequent closed itemsets for single level dataset, without the need to access the dataset again. The CLOSE+ algorithm efficiently generates the frequent closed items for single level dataset. This algorithm uses the min-max basis to avoid the multiple access of dataset. An algorithm to extract the frequent itemsets gives a series of sets of Fk, where each Fk is a set containing all frequent itemsets F of length k known as k-itemsets.

The present work extends the CLOSE+ approach, to discover the association rules from multi level dataset. This algorithm uses the generators and closed itemsets to reduce the number of rules in the discovery of association rule mining. The generators are used to identify the items that frequently occurred in the dataset. Thus the generator is a generalized item which will generates the frequent itemset. In this Extended CLOSE+ algorithm, finds the frequent closed itemset using generators. The extended CLOSE+ approach identifies the frequent closed itemsets and its generators using two propositions. The propositions satisfy the property that an itemsets support will be equal to its closure’s support. The two propositions are,

The support of a generator G will be greater than its subsets support, g  G.

The support of a closed itemset c will be less than its supersets c  C.

Thus, the following algorithm outlines the process of the Extended CLOSE+ approach which will find the frequent closed itemsets and generators for multi level dataset.

Input: sets FK of frequent k-itemsets

Output: sets FCK of frequent k-generators, with closure and support

for k = 1 to n do

for all itemsets l  Fk

isgenerator := true;

for all subsets l’  Fk-1 of l do

if supp(l’) = supp(l) then

isgenerator := false;

end;

if isgenerator = true then

insert l in FCk generators with supp(l)

isclosed := true;

for all supersets l’’  Fk+1 of l do

if supp(l’’) = supp(l) then

isclosed := false;

end;

if isclosed = true then

for n = k to 0 step -1 do

for all subsets of g  FCn, generators of l do

if supp(g) = supp(l) then

insert l in g.closure

end;

end;

end;

end;

end;

end;

end;

return FCk

Extended CLOSE+ algorithm for Multi Level Association Rules.

The following table shows the Frequent Closed Item from the example transaction dataset (Table 4.1), using CLOSE+ approach.

Closed Itemsets

Generators

[1-*-*]

[1-*-*]

[1-1-*]

[1-1-*]

[1-1-1]

[1-1-1]

[1-*-*, 2-2-*]

[2-2-*]

[2-*-*, 1-1-*]

[2-*-*, 1-1-*]

[1-1-*, 1-2-*]

[1-2-*]

[1-1-*, 2-2-*]

[2-2-*]

[1-*-*, 2-2-1]

[2-2-1]

[2-*-*, 1-1-1]

[2-*-*, 1-1-1]

[1-2-*, 1-1-1]

[1-2-*, 1-1-1]

[1-*-*, 2-1-*, 2-2-*]

[2-1-*]

[2-*-*, 1-1-*, 1-2-*]

[2-*-*, 1-2-*]

[1-1-*, 1-2-*, 2-2-*]

[1-2-*, 2-2-*]

[1-1-*, 2-1-*, 2-2-*]

[ 2-1-*]

[1-*-*, 2-1-1, 2-2-*]

[ 2-1-1]

[1-1-*, 2-1-1, 2-2-*]

[2-1-1]

[1-1-*, 2-2-1, 1-2-*]

[2-2-1]

[2-1-*, 1-1-1, 2-2-*]

[2-1-*][2-2-*, 1-1-1]

[2-2-*, 1-1-1, 2-1-1]

[2-2-1] [2-2-*, 1-1-1]

Table 4.4 Frequent closed itemsets, generators using Extended CLOSE+

Table 4.4 shows the frequent closed itemsets and their associated generators when the Extended CLOSE+ approach is used. The Extended CLOSE+ algorithm generates the frequent closed itemsets with Cross Level are shown in Table 4.5. These Frequent closed itemsets and generators are from multiple concept levels. In addition, the last two frequent closed itemsets have two generators each associated with them.

Closed Itemsets

Generators

[1-*-*]

[1-*-*]

[1-1-*]

[1-1-*]

[1-1-1]

[1-1-1]

[1-*-*, 2-2-*]

[2-2-*]

[2-*-*, 1-1-*]

[2-*-*, 1-1-*]

[1-1-*, 1-2-*]

[1-2-*]

[1-1-*, 2-2-*]

[2-2-*]

[1-*-*, 2-2-1]

[2-2-1]

[2-*-*, 1-1-1]

[2-*-*, 1-1-1]

[1-2-*, 1-1-1]

[1-2-*, 1-1-1]

[1-*-*, 2-1-*, 2-2-*]

[2-1-*]

[2-*-*, 1-1-*, 1-2-*]

[2-*-*, 1-2-*]

[1-1-*, 1-2-*, 2-2-*]

[1-2-*, 2-2-*]

[1-1-*, 2-1-*, 2-2-*]

[2-1-*]

[1-*-*, 2-1-1, 2-2-*]

[2-1-1]

[1-1-*, 2-1-1, 2-2-*]

[2-1-1]

[1-1-*, 2-2-1, 1-2-*]

[2-2-1]

[2-1-*, 1-1-1, 2-2-*]

[2-1-*]

[2-2-*, 1-1-1]

[2-2-*, 1-1-1, 2-1-1]

[2-1-1]

[2-2-*, 1-1-1]

Table 4.5 Frequent closed itemsets and generators with cross-level using Extended CLOSE+

Mining Non-Redundant Association rules in Multi-Level Datasets

Here, the proposed work derives the more number of concise non-redundant rules. In this work, the minimal number of antecedent and maximal number of consequents are considered. In this work, the derived association rules are more concise and there is no loss of information.

The use of frequent itemsets for association rule mining often results in the generation of a large number of rules. This is a widely recognized problem. Recent work has demonstrated that the use of closed itemsets and generators can reduce the number of rules generated. This has greatly reduced redundancy in the rules derived from single level datasets. Despite of this, redundancy still exists in the rules generated from multi-level datasets even when using some of the methods designed to remove redundancy. This redundancy is a hierarchical redundancy. Hierarchical redundancy only exists in multi-level datasets, because there is a hierarchical or taxonomic structure for a dataset.

Firstly, the following Association Rules are derived from the frequent itemsets in Table 4.5. In this example, the proposed work uses the MinMaxExactRule and MinMaxApproximateRule approaches to generate the basis rules. The discovered rules are from multiple levels and can include cross-level rules, due to cross-level frequent itemsets. The MinMaxExactRule and MinMaxApproximateRule approaches can remove redundant rules, but, it does not remove hierarchy redundancy. Because these approaches, will remove only same concept level itemsets. For these examples the minimum confidence threshold is set to 0.5 or 50%.

No.

Exact Basis Rules

Supp

Conf

1

[1-2-*]  [1-1-*]

0.571

1.0

2

[2-2-*]  [1-1-*]

0.571

1.0

3

[2-1-1]  [1-1-1]

0.428

1.0

4

[2-1-*]  [1-1-*, 2-2-*]

0.428

1.0

No.

Approximate Basis Rules

Supp

Conf

1

[1-1-*]  [1-2-*]

0.571

0.666

2

[1-1-*]  [2-2-*]

0.571

0.666

3

[1-1-1]  [2-1-1]

0.428

0.75

4

[1-1-*]  [1-2-*, 2-2-*]

0.428

0.5

5

[1-2-*]  [1-1-*, 2-2-*]

0.428

0.75

6

[2-2-*]  [1-1-*, 1-2-*]

0.428

0.75

7

[1-1-*]  [2-1-*, 2-2-*]

0.428

0.5

8

[2-2-*]  [1-1-*, 2-1-*]

0.428

0.75

Table 4.6 Multi-level Association Rules

The proposed approach, results 4 exact and 8 approximate basis association rules using MinMaxExactRule and MinMaxApproximateRules as shown in Table 4.6. All of these association rules are from the same concept level. Thus, Cross-level rules are not derived. The Cross-level rules are in the table 4.7, results in 13 exact basis and 22 approximate basis association rules derived by using the MinMaxExactRule and MinMaxApproximateRules approaches.

No.

Exact Basis Rules

Supp

Conf

1

[2-2-*]  [1-*-*]

0.571

1.0

2

[1-2-*]  [1-1-*]

0.571

1.0

3

[2-2-*]  [1-1-*]

0.571

1.0

4

[2-2-1]  [1-*-*]

0.428

1.0

5

[2-1-*]  [1-*-*, 2-2-*]

0.428

1.0

6

[2-1-*]  [1-1-*, 2-2-*]

0.428

1.0

7

[2-1-1]  [1-*-*, 2-2-*]

0.428

1.0

8

[2-1-1]  [1-1-*, 2-2-*]

0.428

1.0

9

[2-2-1]  [1-1-*, 1-2-*]

0.428

1.0

10

[2-1-*]  [1-1-1, 2-2-*]

0.428

1.0

11

[2-2-*, 1-1-1]  [2-1-*]

0.428

1.0

12

[2-1-1]  [2-2-*, 1-1-1]

0.428

1.0

13

[2-2-*, 1-1-1]  [2-1-1]

0.428

1.0

Table 4.7 Cross-level Exact association rules .

Table 4.7 shows the 13 exact basis rules derived by the MinMaxExactRule algorithm, which lists all important and non-redundant rules. However, there are still redundant rules in the above table. Looking the rules in Table 4.7, the rule 4 is redundant to rule 1, rule 7 is redundant to rule 5, rule 8 is redundant to rule 6 and rule 12 is redundant to rule 10. For example, the item 2-2-1 (from rule 4) is a child of the more general item 2-2-* (from rule 1). Thus, rule 4 is more specific version of rule 1. Because, the rule 1 says that 2-2-* with a consequent say C, whereas rule 4 requires 2-2-1 with consequent C. Any item that is a descendant of 2-2-* will generate a rule with consequent C. It does not have to be 2-2-1. Thus, rule 4 is more restrictive. Because 2-2-1 is part of 2-2-* having rule 4 does not bring any new information to the user. As the information contained in it is actually part of the information contained in rule 1. Thus, rule 4 is redundant.

This redundancy comes purely from the dataset having multiple concept levels through a hierarchy or taxonomy. In a flat dataset, all of the items are at one single concept level and thus the items are all unrelated. In a multi-level dataset, items can be across several concept levels. This will have superitem and subitem. Because of this, items have relations amongst themselves. These relations introduce hierarchical redundancy, which are to be removed from the basis rule sets. In the given example showed only exact basis association rules, however, this hierarchical redundancy can also have approximate association rules.

Non-Redundant Multi-Level/Cross-level Exact Rules

In this section, a method is proposed to determine and eliminate hierarchically redundant exact association rules from multi-level datasets. First, the proposed work defines the hierarchical redundancy. Second, this is applied to two existing non-redundant rule extraction approaches. Finally, the algorithm implements both of these enhanced approaches.

Hierarchical Redundancy for Exact Basis: Let R1 = X1 ï‚® Y1 and R2 = X2 ï‚® Y2 be two exact association rules, then rule R1 is redundant to rule R2 if

1). the itemset X1 is made up of items where at least one item in X1 is descendant from the items in X2 and

2). the itemset X2 is entirely made up of items where at least one item in X2 is an ancestor of the items in X1 and

3). the other non-ancestor items in X2 are all present in itemset X1.

From this definition, it is noticed that for an exact association rule X1 ï‚® Y1, there does not exist any other rule X2 ï‚® Y2, such that at least one item in X1 shares an ancestor-descendant relationship with X2. If other items X2 are present in X1, then X1 ï‚® Y1 is a non-redundant rule. That is, a rule X ï‚® Y is valid if it has no ancestor-descendant relationship between any items in itemsets X and Y. For example, 1-2-1 ï‚® 1-2-* is not a valid rule, as an item in the antecedent shares a ancestor-descendent relationship with an item in the consequent, but 1-2-1 ï‚® 1-1-3 is a valid rule.

Exact rules means rules whose confidence is 1 otherwise they are Approximate Rules. The proposed approach also applied to approximate rules. The following is the MinMaxExact definition with HRR (Hierarchical Redundancy Removal).

MinMaxExact with HRR Approach: For the MinMaxExact basis with HRR (MME-HRR), C is the set of the discovered frequent closed itemsets. For each closed itemset c in C, Gc is the set of generators for c. Also, G is the set of all generators. g’ is a generator for another closed itemset c’ The c` is the closed itemset from C derived from g’ (Cg’). From this, the definition for the hierarchically non-redundant exact basis MinMaxExact with HRR is as follows.

ReliableExactRule with HRR Approach: For the ReliableExactRule basis with HRR (RER-HRR), C is the set of the discovered frequent closed itemsets. For each closed itemset c in C, Gc is the set of generators for c. In addition, G is the set of all generators in which g1 is a generator for the closed itemset c1, c1 is the closed itemset from C derived from g1 (Gg1). Thus from this the definition for the hierarchically non-redundant exact basis for ReliableExactRule with HRR is as follows.

Proposed Algorithms for Non-Redundant Multi-Level Exact Rules

From the above definitions HRR, MinMaxExactRule with HRR and ReliableExact Rule with HRR, the proposed work, develop the necessary algorithms to implement enhanced approaches for deriving non-redundant exact association rules. In the following algorithms, c is a closed itemset, C is the set of closed itemsets, g is a generator and G is the set of generators.

Input: C: a set of frequent closed itemsets G: a set of minimal generators. For g G , g.closure is the closed itemset of g.

Output: A set of non-redundant multilevel rules.

1. MinMaxExact := 

2. for each k=1 to v

3. for each k-generator g Gk

4. nonRedundant = true

5. if g g closure

6. for all g′  G

7. if (g ′ ≠ g)

8. if ( g′ is ancestor set of g ) and ((c′ =c) or ( g= g ′)) and (g′ is not

ancestor set of c′) then

9. nonRedundant = false

10. break

11. endif

12. end if

13. end for

14. if nonRedundant = true

15. insert { (g c), g.supp} in MinMaxExact

16. end if

17. end if

18. end for

19. end for

20. return MinMaxExact

Figure 4.2. Proposed MinMaxExact with HRR (MME-HRR) implementation.

Input: C: a set of frequent closed itemsets G: a set of minimal generators. For g G, g.closure is the closed itemset of g.

Output: A set of non-redundant multilevel rules.

1. ReliableExact := 

2. for all c C

3. for all g Gc

4. nonRedundant = false

5. if c C such that c  c and c g Gc , we have (c or c)g)  g) then

nonRedundant = true

7. else

8. nonRedundant = false

9. break

10. endif

11. for all g G

12. if g ′ ≠ g

13. if ( g′ is ancestor set of g ) and (c=c or g=g) and ( g′ is not ancestor set

of (c or g) and (g′ is not descendant set of (c or g′ ) then

nonRedundant = true

15. break

16. endif

17. endif

18. end for

19. if nonRedundant = true

20. insert { (g c or g, g.supp} in ReliableExact

21. end if

22. end for

23. end for

24. return ReliableExact

Figure 4.3 ReliableExactRule with HRR (RER-HRR) implementation.

The complexity of the original MinMaxExact is O(n), where n is the number of generators derived from the frequent itemsets. For the algorithm MinMaxExact with HRR, before generating a rule, we need to scan all generators to determine whether it is hierarchically redundant. Therefore, the complexity of the algorithm MinMaxExact with HRR is O(n2). For the original ReliableExactRule algorithm, the complexity is O(n2). Our modified algorithm ReliableExactRule with HRR, does not change its complexity, i.e., O(n2). For large datasets, with the O(n2) complexity, the two proposed enhanced methods may have efficiency problems. This issue will need to be addressed in the future.

Experiments

The above algorithms are conducted to test and evaluate the effectiveness of the proposed hierarchically non-redundant exact basis and to confirm that it is also a lossless basis set. This section presents and details the implementation and their results. The proposed method used 6 datasets to test our approach to discover whether it reduced the size of the exact basis rule set and to test that the basis set was lossless, meaning all the rules could be recovered. These datasets were composed of 100, 200, 500, 2000 and 5000 transactions and are named A to F respectively. The key statistics for these built datasets are detailed in Table 5.

Dataset

MME

MMEHR

RE

REHR

A

15

10

13

9

B

106

68

80

58

C

174

134

113

89

D

577

429

383

305

E

450

405

315

287

F

725

602

91

80

Table 4.8 Obtained Frequent itemsets using Exact Basis

Fig. 4.4. Multi Level association Rules

With the help of above algorithms the above implementation is done on the datasets A, B, C, D, E and F. For these the datasets, the proposed work observed that,

For Dataset A the MinMaxExact Rules and MinMaxExact Rules with HRR (MMEHR) are almost same in different user specified minsupp values where as minconf is considered as 1.00. Because the number of rows in this dataset is very less. Thus the Exact rules, and Reliable rules are very less. So the redundant rules are also very less. In this case the removal redundant rules is minimal.

For B dataset the MMEHR and REHR approaches are reducing the Exact rules compared to the rules generated by MME and RE. MMEHR is generating less number of rules compared to MME approaches. The hierarchical redundancy exists in the dataset B is removed by using MMEHR approach. The RE approach is giving the reliable rules from the rules generated by MME and MMEHR. Thus the number of rules generated by RE must be equal or less than the number of rules generated by MMEHR. With the help of above figure this is satisfied. Hence the proposed work, reducing the hierarchical redundancy with the help of MMEHR and REHR approaches.

For C dataset the Exact rules are gradually reduced by the different approaches MME, MMEHR, RE and REHR respectively. That is, Exact rules are generated by MME, which contain hierarchical redundancy, which is removed by MMEHR. These rules are reliable or not is checked with RE and then hierarchical redundancy at multi level or cross level are removed by using RERHR. This proves that the algorithms are applied one after the another to get quality and accurate rules.

For D and E datasets there are more Exact Rules are generated with MME approach. This may contains many hierarchical redundancy rules which are eliminated using the proposed approaches MMEHR. For these datasets the number rules generated using MMEHR are almost same. Still there may be a chance of redundancy at cross levels. These are eliminated by using RE and REHR approaches. These gradually decreasing the number of rules, thus the proposed approach, is fully removing the hierarchical redundancy rules.

The F dataset generates a large number of exact rules using MME approach. This contains many hierarchical redundancy rules, these all are removed with MMEHR approach. Later the proposed work generated the RE and REHR rules to remove the same ancestor-descendent rules at different levels. With the help of above figure, it is proved that the F dataset contains more hierarchical redundancy rules, are removed with MMEHR and less ancestor-descendent rules eliminated with RERHR.

All the above datasets are used to analyze the proposed work in different possibilities. In all these cases the proposed work reduced the hierarchical redundancy.

Non-Redundant Multi-Level Approximate Rules

Here we outline how to determine and eliminate hierarchically redundant approximate association rules from multi-level datasets. With the help of hierarchical redundancy definition, the same is applied to two existing non-redundant rule extraction approaches thus defining an enhanced implementation of both. Finally, the algorithm outlined to implement both of these enhanced approaches.

Approach For Non-Redundant Multi-Level Approximate Rules

Previously, hierarchical redundancy was defined for exact association rules that have a confidence value of 1 (see Section 3.4.2). Here hierarchical redundancy is defined for approximate association rules whose confidence value is less than 1. Since approximate association rules are measured by their confidence. This indicates their strength, trustworthiness, accuracy and/or reliability. It is important to ensure the rules with a high confidence are kept and made available. The proposed algorithms consider the content of the rule, and its confidence. Thus if a rule considered as a hierarchically redundant (due to the content of the antecedent), may not be considered as redundant, if its confidence is greater than the rule making it redundant. This happens because the proposed work keeps the rules which have the highest confidence in the approximate basis rule set.

MinMaxApprox Basis (MMA): Let C be the set of frequent closed itemsets and G be the set of minimal generators of the frequent closed itemsets in C.

where γ(g) is the closure of g.

Reliable Approximate Basis (RAB): Let C be the set of frequent closed itemsets and G be the set of minimal generators of the frequent closed itemsets in C.

Where γ(g) is the closure of g, γ(g’) is the closure of g’ and conf refers to confidence measure

In the present work hierarchical redundancy is defined in approximate association rules through the following definition.

Hierarchical Redundancy for Approximate Basis : Let R1: X1 ï‚® Y1 with confidence C1 and R2: X2 ï‚® Y2 with confidence C2 be two approximate association rules. Rule R1 is redundant to rule R2 if

At least one item in X1 is descendant from the items in X2 and

At least one item in X2 is an ancestor of the items in X1 and

The other non-ancestor items in X2 are all present in itemset X1 and

The confidence of R1 (C1) is less than or equal to the confidence of R2(C2).

From this definition, if for an approximate association rule X1 ï‚® Y1

There does not exist any other approximate rule X2 ï‚® Y2 such that at least one item in X1 shares an ancestor-descendant relationship with X2 containing the ancestor(s) and all other items X2 are present in X1 or

The confidence value of rule X1 ï‚® Y1 is greater than the confidence of rule X2 ï‚®Y2 where the rule X2 ï‚® Y2 is a more general/abstract version of rule X1 ï‚® Y1, then X1 ï‚® Y1 is a non-redundant rule.

To test for hierarchical redundancy, this definition is applied to a rule set, along with another condition (for a rule to be considered valid). A rule X ï‚® Y is valid if it has no ancestor-descendant relationship between any items in itemsets X and Y. Thus for example 1-2-1 ï‚® 1-2-* is not a valid rule, but 1-2-1 ï‚® 1-1-3 is a valid rule. If this condition is not met by any rule X2 ï‚® Y2 when testing to see if X1 ï‚® Y1 is redundant to X2 ï‚® Y2, then X1 ï‚® Y1 is a non-redundant rule as X2 ï‚® Y2 is not a valid rule.

The proposed work is to remove hierarchical redundancy. This approach uses closed itemsets and generators to discover the non-redundant approximate basis rules.

MinMaxApprox with HRR Basis (MMAHRR): Let C be the set of frequent closed itemsets and G be the set of minimal generators of the frequent closed itemsets in C.

where g is a descendant set of g and g is an ancestor set of g and c = c and g has no ancestors and/or descendants of c and where γ(g) is the closure of g, γ(g) is the closure of g and conf refers to the confidence measure of an Association Rule.

Reliable Approximate Basis with HRR (RAB-HRR): Let C be the set of frequent closed itemsets and G be the set of minimal generators of the frequent closed itemsets in C.

where g is a descendant set of g1 and g1 is an ancestor set of g and c = c1 or g= g1 and g1 has no ancestors and/or descendants of c1 or g1 and where γ(g) is the closure of g, γ(g1) is the closure of g1 and conf refers to the confidence measure of an Association Rule.

Algorithm For Non-Redundant Multi-Level Approximate Rules

Using these two definitions MMAHRR and RAB-HRR along with definition for hierarchical redundancy in approximate basis association rules, the proposed method builds the basis for generating non-redundant multilevel approximate basis association rules. The enhanced approaches are defined as follows (where HRR stands for Hierarchical Redundancy Removal).

In the proposed work the necessary algorithms are developed to implement enhanced approaches for deriving non-redundant approximate association rules. In the following algorithms, c is a closed itemset, C is the set of closed itemsets, g is a generator, G is the set of generators, conf refers to the confidence measure of a given rule, minconf is the minimum confidence threshold (ie. all rules must have a confidence value above this threshold in order to be consider strong) and gis the closure of g.

Input: Set of Frequent Closed Itemsets and generator

Output: Set of Non-Redundant multi-level approximate basis rules

MinMaxApprox

nonRedundant = true

for k=1 to max generatorlength -1 do

for all k-length generator g  G

for all cCj>k | j= closed itemset© length & (g)  c do

If (supp(c) /supp(g) >= minconf)

for all k length generator gG where g g do

for all cCj>k | j= closed itemset (c) length & (g)  c do

If (supp(cï‚¢)/supp(gï‚¢)>=minconf)

if (gï‚¢ ancestor of g) & ((cï‚¢\gï‚¢) = (c\g)) & !(gï‚¢ ancestor of (cï‚¢\gï‚¢)) &

(supp(cï‚¢) / supp(gï‚¢) >= supp(c) / supp(g)) then

nonRedundant = false break for loops at line 7 & 8

if (nonRedundant) then

insert {r: g ï‚® (c\g), supp(c) / supp(g)} into MinMaxApprox

nonRedundant = true

endif

endif

end for

endfor

endif

endfor

endfor

endfor

Return MinMaxApprox

Figure 3.12. Proposed MinMaxApprox with HRR (MMA-HRR) implementation.

Input: Set of Frequent Closed Itemsets and generator

Output: Set of Non-Redundant multi-level approximate basis rules

ApproxBasis

nonRedundant = true

for each cC

for each gG such that (g)  c

if (supp(c) /supp(g) >= minconf)

if for all conf(gc) > conf(gc)

for all g1G where g  g do

if (g1 ancestor of g)

for all c1  C do

if (g1c1 & (c=c1) & !(g1 ancestor of (c1\g1)) &

supp(c1)/supp(g1) > =supp(c)/supp(g)) then

nonRedundant = false

break for loops at line 7 & 9

else

nonRedundant = false

endif

if (nonRedundant) then

insert {r: g ï‚® (c\g), supp(c) / supp(g)} into ApproxBasis

endif

endfor

endif

endfor

endif

endif

endfor

endfor

Return ApproxBasis

Figure 3.13. Proposed ReliableApproximateRule with HRR (RAR-HRR) implementation.

The complexity of the original MinMaxApprox is O(n2), where n is the number of generators derived from the frequent itemsets. For the algorithm MinMaxApprox with HRR, the complexity is O(n2) and thus the complexity of the proposed approach is the same as the single level approach. For the original ReliableApproximateRule the complexity is O(n2), with the enhanced ReliableApproximateRule with HRR approach also having a complexity of O(n2). For large datasets the O(n2) complexity may cause efficiency problems, however the proposed enhanced versions are no worse than the original approaches. The issue of efficiency will need to be addressed in the future.

Experimental Results

Experiments were conducted to test and evaluate the effectiveness of the proposed approach to deriving hierarchically Non-Redundant approximate basis rule set, which is also a lossless basis sets. This section presents and details the experiments undertaken and the results achieved.

In this proposed method, 6 datasets are used to test the given approach to discover reliable rules. These rules are reduced the size of the approximate basis rule set and to test that the basis set was lossless. The process to discover the Association Rules involves three steps. Firstly, the frequent itemsets are discovered through the use of minimal support values for each hierarchy level. Second, from the frequent itemsets, the frequent closed itemsets and generators are derived. Finally, the Association Rules are built.

For all of the testing undertaken, the minimum confidence threshold for the Association Rules was set at 0.5. Tables 4 present the results obtained from the built datasets showing the number of approximate basis rules obtained and the percentage reduction achieved.

DataSet

Approximate Basis

Approx Rules

MMA

MMA With HRR

RAB

RAB With HRR

A

36

27

28

26

34

B

22

20

20

20

20

C

181

161

166

146

147

D

700

587

398

347

347

E

2546

2085

1608

1387

1332

F

6427

4844

3415

2970

2267

Table 4: Results for different datasets

Fig 1. Generated Multi level Association Rules for A, B, C, D, E and F

The above algorithms are applied on the datasets A, B, C, D, E and F. For these the datasets, the proposed work observed that,

For Dataset A and B the MMA Rules and MMAHRR Rules are almost same in different user specified minsupp values where as the user specified confidence minconf <1.00. Because the number of rows in these dataset is very less. Thus the Approximate rules, and Reliable rules are very less. So the redundant rules are also very less. In this case the removal of redundant rules is minimal. To clearly describe these datasets the A,B and C datasets rules extraction is given in the Figure 2.

For C dataset the Approximate rules are almost same for all approaches. This specifies that this dataset does not contain any hierarchical and antecedent-descendent relationship at different levels.

For D dataset the Approximate Rules are generated with MMA approach. This may contains hierarchical redundancy rules which are eliminated using the proposed approach MMAHRR. There is a minimal existence of Redundancy at cross levels or antecedent-descendent relationship at different levels. These are eliminated by using RAB and RABHRR approaches. These gradually decreasing the number of MinMax Approximate Rules. Thus the proposed approach, is fully removing the hierarchical redundancy rules.

For E datasets there are more Approximate Rules are generated with MMA approach. This may contains many hierarchical redundancy rules which are eliminated using the proposed approach MMAHRR. Still there may be a chance of redundancy at cross levels. These are eliminated by using RAB and RABHRR approaches. Thus there is a gradual decrease in the number of rules with different approaches. Hence, the proposed approach, fully remove the hierarchical redundancy rules.

The F dataset generates a large number of Approximate rules using MMA approach. This contains many hierarchical redundancy rules, these all are removed with MMAHRR approach. Later the proposed work generated the RAB and RABHRR rules to remove the same ancestor-descendent rules at different levels. With the help of above figure, it is proved that the F dataset contains more hierarchical redundancy rules, are removed with MMAHRR and less ancestor-descendent rules eliminated with RABHRR.

All the above datasets are used to analyze the proposed work in different possibilities. In all these cases the proposed work reduced the hierarchical redundancy.

The number of Approximate rules are very less for B dataset whereas F contains a large approximate rules. The A and B Dataset approaximate rules are almost same. The dataset A, B and C are shown in fig 2 to make clear representation of these approaximate rules. For all datasets the number of Reliable rules are less compared with number of Approximate Rules.

Fig 2. Generated Multi level Association Rules for A, B & C

With the help of Figure 2, The B dataset number of rules are same for all approaches. Since, the dataset contains very less number of rows.

For A dataset the MMA rules are more, where as the rules generated with MMAHRR, RAB and RABHRR are same. This specifies that there is nor hierarchical redundancy in this dataset.

For C dataset the MMA are high, which may have redundant rules, these are continuously reduced with MMAHRR), RAB, RABHRR.

The following Figure represented for the analysis of three Datasets D, E and F.

Fig 2. Generated Multi level Association Rules for D, E & F

The D dataset numbers of rules are gradually decreasing with different approaches. Hence, the proposed approaches are removing the hierarchical and antecedent-descendent cross level rules.

For E and F datasets the MMA rules are generated which contain the hierarchical redundancy, these are removed with MMAHRR. These rules contains redundancy at different levels which are removed by RAB. Now the RAB contains the antecedent-descendent rules which are removed by RABHRR. The proposed algorithms are reducing the redundant rules and getting more quality and loss less association rules.

SUMMARY

Redundancy in association rules affects the quality of the information presented and this has a negative effect on its usefulness. The goal of redundancy elimination is to improve the quality and usefulness of the rules. Our work aims to remove hierarchical redundancy in multi-level and cross level rules from multi-level datasets, thus reducing the size of the rule set to improve the quality and usefulness, while remaining lossless. We have proposed and outlined approaches which remove hierarchical redundancy of both exact and approximate association rules through use of a dataset’s hierarchy / taxonomy structure and the ancestor / descendant relations between topics contained within. We have also outlined algorithms that allow for the lossless recovery of both exact and approximate association rules that our proposed redundancy removal approaches deemed to be hierarchically redundant, thus allowing for the full exact and / or approximate association rule sets to be derived if needed.

Note: Related to this chapter, a research papers are published in

R. Vijaya Prakash, Dr. A Govardhan, Prof. SSVN. Sarma, "Concise Representation of Multi-level Association Rules using MinMaxExact Rules", International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012, ISSN: 2277 – 9043

R. Vijaya Prakash, Dr. A Govardhan, Prof. SSVN. Sarma, "Mining Non-Redundant Frequent Patterns In Multi-Level Datasets Using Min Max Approximate Rules", International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 2, July- September (2011), Journal Impact Factor (2012): 3.9580



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now