Why Is Association Mining Important

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Although OLAP tools support multidimensional analysis and decision making, additional data analysis tools are required for in-depth analysis, such as data classification, clustering, and the characterization of data changes over time. In addition, huge volumes of data can be accumulated beyond databases and data warehouses.Typical examples include the World Wide Web and data streams, where data flow in and out like streams, as in applications like video surveillance, telecommunication, and sensor networks. The effective and efficient analysis of data in such different forms becomes a challenging task. The abundance of data, coupled with the need for powerful data analysis tools, has been described as a data rich but information poor situation. The fast-growing, tremendous amount of data, collected and stored in large and numerous data repositories, has far exceeded our human ability for comprehension without powerful tools.As a result, data collected in large data repositories become ―data tombs‖—data archives that are seldom visited. Consequently, important decisions are often made based not on the information-rich data stored in data repositories, but rather on a decision maker‘s intuition, simply because the decision maker does not have the tools to extract the valuable knowledge embedded in the vast amounts of data. In addition, consider expert system technologies, which typically rely on users or domain experts to manually input knowledge into knowledge bases. Unfortunately, this procedure is prone to biases and errors, and is extremely time-consuming and costly. Data mining tools perform data analysis and may uncover important data patterns, contributing greatly to business strategies, knowledge bases, and scientific and medical research. The widening gap between data and information calls for a systematic development of data mining tools that will turn data tombs into ―golden nuggets‖ of knowledge. What Is Data Mining?  Discover information that is ―hidden‖ in the data  associations (e.g. linking purchase of pizza with beer)  sequences (e.g. tying events together: marriage and purchase of furniture)  classifications (e.g. recognizing patterns such as the attributes of employees that are most likely to quit)  forecasting (e.g. predicting buying habits of customers based on past patterns) Expert systems or small ML/statistical programs

Simply stated, data mining refers to extracting or "mining" knowledge from large amounts of data. The term is actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, data mining should have been more appropriately named ―knowledge mining from data,‖ which is unfortunately somewhat long. ―Knowledge mining,‖ a shorter term, may not reflect the emphasis on mining from large amounts of data. Nevertheless, mining is a vivid term characterizing the process that finds a small set of precious nuggets from a great deal of raw material. Thus, such a misnomer that carries both ―data‖ and ―mining‖ became a popular choice. Many other terms carry a similar or slightly different meaning to data mining, such as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. 3

1) Briefly explain data mining functionalities and explain what kind of patterns can be mined.(8) (Anna Univ MEDec 2008) 2) What is data mining?(2) Sastra Univ, B Tech Nov 2007 3)Explain the various data mining functions to be performed (2) Sastra Univ, B Tech May 2005 Why Data Mining? — Potential Applications  Direct Marketing  identify which prospects should be included in a mailing list  Market segmentation  identify common characteristics of customers who buy same products  Market Basket Analysis  Identify what products are likely to be bought together  Insurance Claims Analysis  discover patterns of fraudulent transactions  compare current transactions against those patterns Data Mining—What‘s in a Name? Data Mining Knowledge Mining Knowledge Discovery in Databases Data Archaeology Data Dredging Database Mining Knowledge Extraction Data Pattern Processing Information Harvesting Siftware The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of stored data, using pattern recognition technologies and statistical and mathematical techniques 4 What can data mining do?  Classification – Classify credit applicants as low, medium, high risk – Classify insurance claims as normal, suspicious  Estimation – Estimate the probability of a direct mailing response – Estimate the lifetime value of a customer  Prediction – Predict which customers will leave within six months – Predict the size of the balance that will be transferred by a credit card prospect What can data mining do? (cont‘d)  Association – Find out items customers are likely to buy together – Find out what books to recommend to Amazon.com users  Clustering – Difference from classification: classes are unknown! 5 Knowledge discovery as a process is depicted and consists of an iterative sequence of the following steps: 1. Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined)1 3. Data selection (where data relevant to the analysis task are retrieved fromthe database) 4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)2 5. Data mining (an essential process where intelligent methods are applied in order to extract data patterns) 6. Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures; Section 1.5) 7. Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user) Data Mining: On What Kind of Data?  Relational databases  Data warehouses  Transactional databases  Advanced DB and information repositories  Object-oriented and object-relational databases  Spatial databases  Time-series data and temporal data  Text databases and multimedia databases  Heterogeneous and legacy databases  WWW 6 adapted from: U. Fayyad, et al. (1995), "From Knowledge Discovery to Data Mining: An Overview," Advances in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press Data Target Data Selection Knowledge Knowledge Preprocessed Data Patterns Data Mining Interpretation/ Evaluation Knowledge Discovery in Databases: Process Preprocessing Steps of a KDD Process  Learning the application domain:  relevant prior knowledge and goals of application  Creating a target data set: data selection  Data cleaning and preprocessing: (may take 60% of effort!)  Data reduction and transformation:  Find useful features, dimensionality/variable reduction, invariant representation.  Choosing functions of data mining (summarization, classification, regression, association, clustering)  Choosing the mining algorithm(s)  Data mining: search for patterns of interest  Pattern evaluation and knowledge presentation  visualization, transformation, removing redundant patterns, etc.  Use of discovered knowledge

1. Explain the steps knowledge discovery in databases with a neat sketch(8) Anna Univ ME(CSE) Dec 2007

2. List and discuss the various data mining techniques.(8)Anna Univ ME(CSE) Dec 2007

3. Explain the knowledge discovery process in detail(15) Sastra Univ, B Tech Nov 2006

4. Briefly explain data processing technologies (8) Sastra M Tech(CSE) Dec 2003 State the performance issues in data mining(8) Sastra Univ, B Tech(Hons) Dec 2005

5. State the importance of Statistics in Data mining.(2) Anna Univ ME(CSE) Dec 2007

6. Define the term KDD(2) Sastra Univ, B Tech Dec 2005

7. Define KDD(2) Sastra Univ, B Tech Nov 2006

8. Compare Data mining with knowledge discovery in database (2)Sastra Univ, B Tech May 2005

List out any four data mining tools.(2) Anna Univ ME(CSE) Dec 2007 1. Data mining is an essential step in the process of knowledge discovery in data bases. Give your comments (9) Sastra Univ, B Tech (Hons) Dec 2005 2. Explain the architecture of a typical data mining system (15) Sastra Univ, B Tech Dec 2005 3. Data mining as a step in the process of knowledge discovery- explain(5) Sastra Univ, B Tech Nov 2007 Data Mining In principle, data mining should be applicable to any kind of data repository, as well as to transient data, such as data streams. Data repositories will include relational databases, data warehouses and transactional databases, advanced database systems, flat files, data streams, and the World Wide Web. Advanced database systems include object-relational databases and specific application-oriented databases, such as spatial databases, time-series databases, text databases, and multimedia databases. The challenges and techniques of mining may differ for each of the repository systems. Architecture of a Typical Data Mining System Data Warehouse Data cleaning & data integration Filtering Databases Database or data warehouse server Data mining engine Pattern evaluation Graphical user interface Knowledge-base

A database system, also called a database management system (DBMS), consists of a collection of interrelated data, known as a database, and a set of software programs to manage and access the data. The software programs involve mechanisms for the definition of database structures; for data storage; for concurrent, shared, or distributed data access; and for ensuring the consistency and security of the information stored, despite system crashes or attempts at unauthorized access. A relational database is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple in a relational table 8

represents an object identified by a unique key and described by a set of attribute values. A semantic data model, such as an entity-relationship (ER) data model, is often constructed for relational databases. An ER data model represents the database as a set of entities and their relationships. Transactional Databases In general, a transactional database consists of a file where each record represents a transaction. A transaction typically includes a unique transaction identity number (trans ID) and a list of the items making up the transaction (such as items purchased in a store). The transactional database may have additional tables associated with it, which contain other information regarding the sale, such as the date of the transaction, the customer ID number, the ID number of the salesperson and of the branch at which the sale occurred, and so on. Major Issues in Data Mining (1)  Mining methodology and user interaction  Mining different kinds of knowledge in databases  Interactive mining of knowledge at multiple levels of abstraction  Incorporation of background knowledge  Data mining query languages and ad-hoc data mining  Expression and visualization of data mining results  Handling noise and incomplete data  Pattern evaluation: the interestingness problem  Performance and scalability  Efficiency and scalability of data mining algorithms  Parallel, distributed and incremental mining methods 9 Major Issues in Data Mining (2)  Issues relating to the diversity of data types  Handling relational and complex types of data  Mining information from heterogeneous databases and global information systems (WWW)  Issues related to applications and social impacts  Application of discovered knowledge Domain-specific data mining tools Intelligent query answering Process control and decision making  Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem  Protection of data security, integrity, and privacy

1. Explain the major issues in Data Mining (8) Sastra M Tech (CSE) Dec 2003

2. What are the major issues in data mining? Explain.(8) Anna Univ ME(CSE) Dec 2007

3. Explain any four issues in data mining(10) Sastra Univ, B Tech Nov 2007

4. State the performance issues of data mining.(2) Sastra Univ, B Tech Nov 2007

5. Explain the data mining issues(15) Sastra Univ, B Tech May 2005

Multi-Dimensional View of Data Mining  Data to be mined  Relational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW  Knowledge to be mined  Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc.  Multiple/integrated functions and mining at multiple levels  Techniques utilized  Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc.  Applications adapted  Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, Web mining, etc. 10

1) List and discuss the various data mining techniques.(8)Anna Univ ME(CSE) Dec 2007

2) List the techniques used for data mining.(2) Sastra Univ, B Tech Nov 2006

Example: Use in retailing  Goal: Improved business efficiency  Improve marketing (advertise to the most likely buyers)  Inventory reduction (stock only needed quantities)  Information source: Historical business data  Example: Supermarket sales records  Size ranges from 50k records (research studies) to terabytes (years of data from chains)  Data is already being warehoused  Sample question – what products are generally purchased together?  The answers are in the data, if only we could see themDate/Time/Register Fish Turkey Cranberries Wine ... 12/6 13:15 2 N Y Y N ... 12/6 13:16 3 Y N N Y ... Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP

1. Compare Data warehouse and Data Mining(5) Sastra M Tech(CSE) Dec 2003

2. How do data warehousing and OLAP relate to data mining?(10) (Anna UnivME Dec 11

2008) 3. Define Data Mining and Explain the steps in the data mining process.(8)(Anna Univ MEDec 2008) Association Rules: The purchasing of one product when another product is purchased represents an association rule. These are used by retail stores to assist in marketing, advertising, floor placement and inventory control. Associations rules are frequently used show the relationships between data items. Association rules detect co mmon usage of items. A database in which an association rule is to be found is viewed as a set of tuples where each tuple contains a set of items .The support of an item or set of items is the percentage of transactions in which that item occurs. Given a target domain, the underlying set of items usually is known so that an encoding of the transactions could be performed before processing. Association rules can be applied to data domains other than categorical. Definition 1: Given a set of items I( I1, I2..Im), and a database of transactions D(t1,t2,..,tn) where ti=(Ii1,Ii2,…Iik) and IijЄI an association rule is implication of the form X Y where X,Y are sets of items called itemsets and X , Y is not equal to zero. Definition 2: The support(s) for an association rule XY is the percentage of transactions in the database that contain X and Y. Definition3: The confidence or strength (α) for an association rule XY is the ratio of the number of transactions that contain X and Y to the number of transactions that contain X. Definition 4: Given a set of items I= [I1,I2,…Im] and a database of transactions D={t1, t2… tn} the association rule problem is to identify all associations rules Xwith a minimum support and confidence.

Why Is Association Mining Important?

Foundation for many essential data mining tasks

Association, correlation, causality

Sequential patterns, temporal or cyclic association, partial periodicity, spatial and multimedia association

Associative classification, cluster analysis, iceberg cube, fascicles (semantic data compression)

Broad applications

Basket data analysis, cross-marketing, catalog design, sale campaign analysis

Web log (click stream) analysis, DNA sequence analysis, etc.

1. Consider the University Database. Given as association rule Student (X,CSE)^ works(X,hard)Top(X,CSE) Total number of students is 10000, A typical statistics gives the number of CSE students as 1000. Number of hard working students are 450 and toppers are predicted to be 390. Calculate the support and confidence. (7) Sastra M Tech(CSE) Dec 2003 12 2. With a relevant example explain the single dimensional Boolean association rule mining algorithm from transactional database.(16) Anna Univ ME(CSE) Dec 2007

3 A database has four transactions. Let min_support be 60% and min_conf=80%. TID Date Items_bought T100 10/15/99 (K,A,D,B) T200 10/15/99 (D,A,C,E,B) T300 10/19/99 (C,A,B,E) T400 10/22/99 (B,A,D)

3) Find all frequent itemsets using Apriori and FP-growth respectively. Compare the efficiency of the two mining processes.(10) (Anna Univ Dec 2008)How might the efficiency of Apriori be improved? Discuss.(6) (Anna Univ Dec 2008)

13 14

1. What is association mining?(2) Sastra Univ, B Tech Dec 2005

2. What is association analysis?(2) Sastra Univ, B Tech Nov 2007 )

3. State the Apriori Algorithm (5) MCSE216E15May 2009

4. Find the Association Rules with 50% support and 75%confidence in the following transactions consisting of six items. (10) MCSE216E15May 2009

Transaction ID

Items

100

Bread, Cheese, Eggs, Juice

200

Bread, Cheese, Juice

300

Bread, Milk, Yogurt

400

Bread, Juice, Milk

500

Cheese, Juice, Milk



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now