Inter Transactional Pattern Discovery Applying Comparative

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

ABSTRACT

In this paper, a pattern trend-based data mining approach has been proposed which convert the numeric stock data to symbolic notations, carries out association analysis through comparative study of apriori and reverse apriori concepts and further applies the mined rules in predicting the further movement of prices . A drastic reduction in the number of scans performed by both the algorithm is seen. The apriori covers 105 scans in performing the evaluation whereas the applied extended reverse apriori covers the same in just 28 scans which is a drastic reduction .The initial formulation is based on inter-stock mining. The execution time is also evaluated and observed that reverse apriori takes less execution time as compared to apriori. There is a roughly 5221 milliseconds (approx) of difference between the both. A comparative study is shown along with the discovery of important pattern trends which shows the investing benefits for the clients in the stock market. This provides a very significant way of evaluating the position of the stocks i.e the highest selling and lowest selling stocks on a day basis. The result shows a huge difference in the number of scans which is the main motive of this study.

Keywords

Trend ,execution time , scans, apriori , reverse apriori, stock price.

INTRODUCTION

Temporal data mining has been proved to be a success story in the field of stock market. Applying temporal data mining techniques have shown effective results. Knowledge is gained by effective evaluation of results. Various tool and techniques have been available for performing different tasks on bulk amount of data. Techniques like classification, association, prediction have been utilized. Association rule mining is most commonly used to mine the transaction data to find interesting association between attributes.

1.1 Trend

Trends are the pattern discovery of movement in the prices on individual time series sequences. Trends are framed by trend indicators which are calculated by connecting the current or present value of an event to its previous value. Finally, a trend-pattern is denoted by a string of trend indicators. Frequent-trend-patterns are detected in the patterns, it is a sub-pattern frequently appears in the trend-pattern. A trend-pattern which is the longest frequent-trend-pattern which originates in the time series is called maximal frequent-trend pattern.

1.2 Stock in brief

The term "stock market" is a market where the trading of the stocks of the various companies and their shares, are sold or purchased. It’s a dynamic environment, where fluctuations can occur at any point of time. The clients can invest in long term and short term based on predictions. By analyzing the movements in the prices i.e. the trends will help in decision assistance.

In this paper the companies are analyzed and their trends are detected. The stocks are added in a list by the brokers for buying and selling purpose, which shows their current prices in the market. Data mining is a logical process intended to explore the business related applications where huge amount of data is generated. It helps in searching interesting patterns and associations between stock attributes, and validates the findings by applying the detected patterns to new subsets of data.

1.3 Steps in mining

The ultimate goal of data mining is prediction and predictive data mining is the most

common type of data mining and one that has the most direct business applications. The process

of data mining consists of three stages:

1) The initial exploration.

2) Model building or pattern identification with validation/verification.

3) Deployment (that is, the application of the model to new data in order to generate predictions)

(Berry and Linoff, 2000).

MOTIVATION

The primary goal proposed in this paper is to predict the intraday movements & trends in the stock market in order to decide whether to buy or sell & the good companies to invest in for long term purpose. As more & more money is being invested, the investors are getting keyed up for the future trends based on the hikes.

RELATED WORK

The stock market has been an interesting field since long time and proved attractive to the researchers for money making purpose. The main desirability is due to the ease of making money, it just a method to correctly predict the dynamic nature of the market could be discovered.[1] Ehsan Hajizadeh has focused that association rule mining is a mining concept discovers useful/interesting associations or correlation based relationships for huge amount of data. Rules generated after association shows conditions based on attributes that occur frequently. This field has received significant attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial, and retail sectors.[2] Ehsan Hajizadeh has elaborated the research areas covered under stock market forecasting. This includes planning investment strategies, uncovering market trends, identifying the preeminent time to purchase, the stocks and what stocks to purchase. [2]

As per Markus Hegland association rules can be defined as a conditional "if-then rules" with two measures quantifying the support and confidence for a given data set. Association rules have been originated in market basked analysis. These have become popular due to the availability of the efficient algorithms.[3] Fing et al has proposed a model based on time series where frequently occurring patterns of stocks have been discovered. It was based on intra stock mining [4]. L.K.Soon et al[5] have framed classification rules and listed stock with fluctuating effect in kualalumpur Composite Index. It indicates the inter-relationship pattern. The stocks are evaluated on the basis of their trading performance.[5] Dr.K.Rameshkumar in [6] have proposed new (ARDM) algorithm to mine domain extraneous rules. The paper has proposed a new algorithm for dealing with AIDS patients' case history. As described in [7]Apriori which is based on association, uses a horizontal data format, i.e. sets of frequent items are linked with each transaction. By means of vertical data format it uses a different format where transaction IDs (TIDs) are associated with each item set. With this format, there is no requirement to scan the database as TID set holds the information required for calculating support[7]. In paper[8], the author’s approach was to enumerate the algorithmic performance of algorithms based on association rule mining with respect to a practically infeasible, but idealized "Oracle". They presented a new online mining called ARMOR algorithm constructed with minimal changes to Oracle. Dongsong Zhang and Lina Zhou have discussed the scope of data mining in the context of financial application. They have enumerated various open issues and challenges that need to be carefully addressed for effective financial management for individuals and institutions [9]. Michael Hahsler has presented a simple probabilistic framework for a transaction database to simulate transaction data in the absence of associations. The results has shown that confidence is systematically influenced by the frequency .The probabilistic data modeling is a valuable framework to analyze interest measures and provides a starting point for further research[10]. [11] shows that traditional association rules are not able to fabricate precise rules due to the huge size of database and redundancy in frequent item set generated during mining process. It has described the negative and positive concepts of association. In [12] an efficient algorithm has been proposed, called HI-mine, which make use of a new data structure, HI-struct, to discover indirect associations between items in the data set. Simon in [13] has introduced a new mechanism called "Trend following (TF)" , which is a rule-based trading mechanism which keeps a check on the movements of long-term market trend and decide when to buy and when to sell a stock. In paper[14], they have proposed an efficient algorithm, called ICMiner (Inter-transaction Closed patterns Miner), for mining closed inter-transaction itemsets.It consists of two phases. First ,to find the frequent items. Then, enumerating closed inter-transaction itemsets through an itemset–dataset tree,called an ID-tree.[14]

STOCK STATUS PREDICTION

Problem statement

Stock market is a very unpredictable environment, dynamic environment. Traditional techniques alone cannot discover all important relations between stocks. Thus a combination of statistics and data mining can be mixed to get better results. The problem statement is divided in to 5 phases.A preprocessed snapshot is shown below.

The paper proposes makes a comparative study of apriori and reverse apriori and finds out the trends of stocks for investment porpose.This is a very useful information for the broker as it is easy to get the highest or lowest status of an individual company but getting the patterns is a difficult task.

Process:-Following companies are evaluated:

Infosys

Coal India

Tata motors

ICICI Bank

4.2 Preprocessing Phases

Phase 1:The stock prices are fetched for a single day from the stock market. The figures shows the preprocessing part. Traded by value is selected for evaluation.

Phase 2:The overall sum and mean are calculated for each stock taking traded by value.

Phase 3: For hourly basis the conversion is performed so that the prediction can be shown for hourly basis.

Figure1. Preprocessing Phase

Phase 4:

After summarization is done , the categorization of the calculated mean is performed.

Table1 : Categorization for variance

Category

Grading

1.45 & >

1.21 – 1.44

0.99 – 1.20

0.71 – 0.98

0.70 & <

Highest

Higher

Average

Lower

The final tabulated formulation is as follows:-

Table 2 :Company evaluation

Here, CMP1, CMP2, CMP3, CMP4 are companies which are evaluated.

These pairs can be represented as:-

P1 = [CMP1, CMP2]

P2 = [CMP2, CMP3]

P3 = [CMP3, CMP4]

P4 = [CMP1, CMP3]

P5 = [CMP2, CMP3, CMP4]

Companies are paired to find the result

Phase 5: [final phase]

On the basis of the mean categorization table, the companies which have 0.80 mean value or higher will be considered & rest are ignored.

After applying this rule the table formulated is as follows:-

Time(Transaction Id)

1.50 & >

1.49 – 1.15

1.14 – 0.80

9.00 – 10.00(T1)

CMP1,CMP2

CMP3

10.00 – 11.00(T2)

CMP4

CMP1

11.00 – 12.00(T3)

CMP1

12.00 – 1.00(T4)

CMP3

CMP1,CMP2

1.00 – 2.00(T5)

CMP1,CMP2,CMP3

2.00 – 3.00(T6)

CMP4

CMP2

3.00 – 4.00(T7)

CMP4

CMP1,CMP2,CMP3

4.3 Algorithm implementation & formulation

An efficient algorithm is applied here known as modified reverse apriori.It reduces the execution time & number of scans.A comparative study is shown as follows:

4.3.1Apriori Algorithm

Initialize: i:= 2, Cand_set1 = all the 1- item sets;

Read database to count the support of Cand_set1to determine Cand_set1.

L1 := {frequent 1- item sets};

While (Li-1 ≠{ NULL}) do

Begin

Cand_seti :=Generate candidate itemsets with the given Lk-i prune (Cand_setk)

For all transactions t belongs to T

Increment the count of all candidates in Cand_seti that are contained in t;

Li := All candidates in Cand_seti with minimum support ;

i:=i+1

End

The working are as follows:

Apriori

Table1: 1-item set

Candidate set (1-item)

Frequent set (1-item)

Database scans(NO)

CMP1 – 6

CMP2 – 5

CMP3 – 4

CMP2 – 3

CMP1 – 6

CMP2 – 5

CMP3 – 4

CMP2 – 3

7

7

7

7

Table 2 : 2-item set

Candidate set (2-item)

Frequent set (2-item)

Database scans(NO)

CMP1,CMP2 – 4

CMP1,CMP3 – 4

CMP1,CMP4 – 2

CMP2,CMP3 – 4

CMP2,CMP4 – 2

CMP3,CMP4 – 1

CMP1,CMP2 – 4

CMP1,CMP3 – 4

CMP1,CMP4 – 2

CMP2,CMP3 – 4

CMP2,CMP4 – 2

7

7

7

7

7

7

Table 3: 3-item set

Candidate set (3-item)

Frequent set (3-item)

Database scans(NO)

CMP1,CMP2,CMP3 – 4

CMP1,CMP2,CMP4 – 1

CMP1,CMP3,CMP4 – 1

CMP2,CMP3,CMP4 – 1

CMP1,CMP2,CMP3 – 4

7

7

7

7

Candidate set (4-item)

Frequent set (4-item)

Database scans(NO)

CMP1,CMP2,CMP3,CMP4 – 1

7

Table 4: 4-item set

The total numbers of scans are 105. Now the results again will be formulated using the reverse apriori.

4.3.2Modified Reverse apriori

The algorithm is as follows:-

j =0, counti =0

generate CDk-j & FIk-j together and make set of TID for each individual set(scan only k-j set)

CDk-j >=minmum_support

Put in FIk-j by union operation according to their TID and delete that CDk-j

j=j +1

for every, the combination set of CDk-j+1 & FIk-j+1 of size k-j

Perform union operation on Ck-ji, Fk-j and if item of Ck-j matches in Fk-j perform union operation on FIk-j of CDk-j and FIk-j and delete that CDk-j dataset

repeat step 2 till CMP1 is true.

The working of reverse apriori is as follows:-

Table 5: 4-item scan

Candidate set (4-item)

Frequent Item set (4-item)

Database scans(no)

CMP1,CMP2,CMP3,CMP4 – 1 {t7}

7

Table 6: 3-item scan

Candidate set (3-item)

Frequent Item set (4-item)

Database scans(no)

CMP1,CMP2,CMP3 – 1 {T7}

CMP1,CMP2,CMP4 – 1 {T7}

CMP1,CMP3,CMP4 – 1 {T7}

CMP2,CMP3,CMP4 – 1 {T7}

0

0

0

0

Continuation of 3-item set:

Candidate set (3-item)

Frequent Item set (4-item)

Database scans(no)

CMP1,CMP2,CMP3 – 4 {T7,T1,T4,T5}

CMP1,CMP2,CMP4 – 1 {T7}

CMP1,CMP3,CMP4 – 1 {T7}

CMP2,CMP3,CMP4 – 1 {T7}

CMP1,CMP2,CMP3 – 4 {T7,T1,T4,T5}

7

Table 7: 2-item scan

Candidate set (3-item)

Frequent Item set (4-item)

Database scans(no)

CMP1,CMP2,CMP3 – 1 {T7}

CMP1,CMP2,CMP4 – 1 {T7}

CMP1,CMP3,CMP4 – 1 {T7}

CMP2,CMP3,CMP4 – 1 {T7}

0

0

0

0

Candidate set (3-item)

Frequent Item set (4-item)

Database scans(no)

CMP1,CMP2,CMP3 – 4 {T7,T1,T4,T5}

CMP1,CMP2,CMP4 – 1 {T7}

CMP1,CMP3,CMP4 – 1 {T7}

CMP2,CMP3,CMP4 – 1 {T7}

CMP1,CMP2,CMP3 – 4 {T7,T1,T4,T5}

7

Table 8: 2-item scan

Candidate set (2-item)

Frequent Item set (4-item)

Database scans(no)

CMP1,CMP2 – 1 {T7 }

CMP2,CMP4 – 1 {T7}

CMP1,CMP4 – 1 {T7}

CMP2,CMP3 – 1 {T7}

CMP3,CMP4 – 1 {T7 }

CMP1,CMP3 – 1 {T7}

CMP3,CMP4 – 1 {T7}

CMP1,CMP4 – 1 {T7}

CMP1,CMP2 – 4 {T7,T1,T4,T5}

CMP1,CMP3 – 4 {T7,T1,T4,T5}

CMP2,CMP3 – 4 {T7,T1,T4,T5}

0

Candidate set (2-item)

Frequent Item set (4-item)

Database scans(no)

CMP2,CMP4 – 2 {T7,T6}

CMP1,CMP4 – 2 {T7,T2}

CMP3,CMP4 – 1 {T7 }

CMP1,CMP2 – 4 {T7,T1,T4,T5}

CMP1,CMP3 – 4 {T7,T1,T4,T5}

CMP2,CMP3 – 4 {T7,T1,T4,T5}

CMP2,CMP4 – 2 {T7,T6}

CMP1,CMP4 – 2 {T7,T2}

7

Table 9: 1-item scan

Candidate set (1-item)

Frequent Item set (4-item)

Database scans(no)

CMP3 –12 {T7}

CMP4 –1 {T7}

CMP1– 5 {T7,T1,T4,T5,T2}

CMP2 – 5 {T7,T1,T4,T5,T6}

CMP3 – 4 {T7,T1,T4,T5}

CMP4 – 3 {T7,T6,T2}

0

Candidate set (1-item)

Frequent Item set (4-item)

Database scans(no)

CMP1– 5 {T7,T1,T4,T5,T2,T3}

CMP2 – 5 {T7,T1,T4,T5,T6}

CMP3 – 4 {T7,T1,T4,T5}

CMP4 – 3 {T7,T6,T2}

7

Thus, the final result is pattern trend showing the hike of the prices as follows:

CMP1, CMP2, CMP3

The numbers of scans are:-

Apriori

Modified Reverse apriori

105

28

The time complexity is:

Time is measured in milliseconds(approx):

APRIORI

REVERSE APRIORI

46345(ms)

41124(ms)

GRAPHICAL COMPARISION

A graphically visualized demonstration of the individual and overall number of scans performed by the algorithms is shown below:

BASED ON TOTAL SCAN

BASED ON INDIVIDUAL CANDIDATE SCAN

BASED ON MILLISECONDS

RESULT

(1)Highest price movement on hourly basis are as follows:

Time slot

Pattern Trend

9.00 – 10.00

CMP2,CMP1,CMP3

10.00 -11.00

CMP4,CMP1

11.00 – 12.00

CMP1

12.00 – 1.00

CMP3,CMP1,CMP2

1.00 – 2.00

CMP1,CMP2,CMP3

2.00 – 3.00

CMP4,CMP2

3.00 – 4.00

CMP4,CMP1,CMP2,CMP3

On daily basis of the computations, the resulting companies are as follows – CMP1, CMP2, CMP3



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now