The Existing Data Mining Tools

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

As long as data mining is thought like 'the key driver of new opportunities and new ideas’ there are questions that need to be answered. How data mining is going to support the Business Intelligence and Knowledge Management for better decision making? And more important question is that if data mining is always going to make benefits and bring values for organizations or not? In this article first we are going to introduce related concepts and technologies to data mining, which are Business Intelligence, Knowledge Management and Data warehouse and explain how they are related to each other. Then you will be familiar with data mining and how it helps the organizations to analyse raw data and get useful information. The benefits of data mining are explained and then the areas and situations for organizations which may cause the data mining to be useless and lead their goals to failure are described. Regarding to the failures we found Information Quality is the most important thing to consider. So as the result we have explained in detail some key criteria about the data and information gathered which is vital to make advantage of using data mining.

Keywords: Business Intelligence, Knowledge Management, Data warehouse, Data mining, Information Quality

Introduction

Today for every organization or company, there is an important vital asset and it is ‘Information’. There are two technologies always have been central in improving the quantitative and qualitative value of the information available to decision makers, Business Intelligence and Knowledge Management. Organization managers have recognized that timely accurate knowledge can mean improved business performance. It shows how they are important for an organization. They usually face a serious challenge: how to handle massive amount of data that they have, generate, collect and store? "There is a need to have a technology that can access, analyze, summarize, and interpret information intelligently and automatically. Responding to this challenge, the field of data mining has emerged" (Ying, et al., 2008). They make advantage of using data mining tools to analyze, discover, and find out answers of unknown queries and to make better decisions. The resource of data mining is called data warehouse. Data warehouse is like a central repository that integrate data from different sources and in other words centralize them. It helps to do complex reporting and data analysis like quarterly or annual comparisons or trending reports. However for success of the data mining, information quality is a critical factor.

Business Intelligence

"Business intelligence (BI) is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information" (Wikipedia, the free encyclopedia, n.d.). With BI an organization can handle large amounts of information to help to identify and develop new opportunities. Using BI an organization can also decide for effective strategies which could help to provide competitive market advantage and stability.

In 1970s Business Intelligence derived from decision-making support technology. Later it experienced a complex and gradual evolution including Transaction Processing System (TPS), Executive Information System (EIS), Management Information System (MIS), Decision Support System (DSS) and other stages. In 1996, the Gartner group defined BI as series of systems which has data warehouses, data analysis and data mining which help the organization to make a better decision and keeps its leading position in the competitive market. Business intelligence is using information of company’s past performance to predict the company’s future performance. Emerging trends from which the company might profit could be revealed by BI.

Using BI technologies, you can have different views of your business operations which are historical views, current views and predictive views. Some of the common functions of BI technologies could be named as reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.

One of the main and important goals of business intelligence deployment is to support making better business decisions. In other words a BI system can be called a decision support system (DSS).

Knowledge management

Knowledge Management (KM) is a concept and a term that get up around two decades ago, approximately in 1990. Very early on in the KM movement, Davenport (1994) offered the still widely quoted definition:

"Knowledge management is the process of capturing, distributing, and effectively using knowledge."

The above definition has the advantage of being simple, stark, and to the point. Few years later, another second definition of KM created by the Gartner Group which is perhaps the most frequently cited one (Duhon, 1998):

"Knowledge management is a discipline that promotes an integrated approach to identifying, capturing, evaluating, retrieving, and sharing all of an enterprise's information assets. These assets may include databases, documents, policies, procedures, and previously un-captured expertise and experience in individual workers."

Both definitions are very corporate orientation and very organizational. KM, historically at least, is primarily about managing the knowledge of and in organizations.

Data Warehouse

An excellent source of data to locate and mine is an enterprise data warehouse. Because of the nature of a data warehouse, most pertinent data that has been selected by analysts and business users should be located within the warehouse structure. In addition, for the explicit purpose of reporting this data is organized and stored. A data warehouse is the main source for data mining. The reason is that the data within the data warehouse has already undergone significant data additions, revisions, modifications, and purging based on business rules and processes.

Data Mining

In today’s world, every business, company and organization has its own large amount of data. They usually use their own data for their future decisions, research and their development. The data in their databases is on their hand when they require it. But the most important thing is to analyze the data and find important information. If you want to grow rapidly you must take quick and accurate decisions to grab timely available opportunities (Arthur, n.d.).

By implementing the typical data warehousing in organizations, users will be allowed to ask and answer questions such as "How many cars were sold, by area, by agency between the months of March and May in 2008?" But on the other side using data mining, business decision makers will be able to ask and answer questions, such as "What are the factors that increases the rate of sell in specific region in specific quarter?" or "What are the best times to do a sale in a year and What are the best areas to increase the number of shops and provide more service for customers?"

Data mining allows users to sift the data in data warehouses and get enormous amount of information. With this process you can access the business intelligence gems. Using the process of data mining, you can extract required valuable information from data. So data mining is about refining data and extracting important information. Data mining is the process of extracting hidden knowledge from large volumes of raw data; it can also be defined as the process of extracting hidden predictive information from large databases (Chaterjee, n.d.). The data mining process will utilize the data in the enterprise data warehouse. To analyze data one important thing is that it should be granular enough. "Data that is characterized by significant aggregations beyond the original grain of the data will not produce significant results when used to create or test against a mining model" (Chaterjee, n.d.).

The process of data mining is mainly divided into 3 steps;

Pre-processing: It is about collecting large amount of relevant data

Mining: It is about data classification, clustering, error correction and linking information

Validation: It is about trust on new information

Benefits of Data Mining for Organizations

Fast and Feasible Decisions

If you want to search for information from huge amount of data, it requires lots of time. It also irritates the person who is doing such. Not only when a person is doing such work the possibility of making mistakes and incorrect decision increases, but also with annoyed mind no one can make accurate decisions for sure. By help of data mining, you can easily get information and make fast and authentic decisions. It also helps to compare information with various factors so the decisions become more reliable.

Powerful Strategies

With the information which is available after the data mining, you can make different strategies. In other word by analyzing information in various dimensions you can make different strategies and implement them. This could help the organization to effectively expand its business boundaries and making authentic decisions.

Competitive Advantage

With the information in your hand you should try to compare it in different aspects and doing competitive analysis and making corrective decisions. This will enable the company to gain competitive advantage.

How could data mining be profitable?

Data mining has been deployed by a wide range of companies successfully. This technology is applicable to any company looking to leverage a large data warehouse to better manage their customer relationships. Early adopters of this technology have tended to be in information-intensive industries such as financial services and direct mail marketing,

To have a successful data mining there are two critical factors:

A large, well-integrated data warehouse

A well-defined understanding of the business process within which data mining is to be applied

Some successful companies include:

Pharmaceutical companies

Credit card companies

Transportation companies

Large consumer package goods companies (to improve the sales process to retailers)

Each of above examples has clear shared interest. They control the knowledge about customers implicit in a data warehouse to reduce costs and improve the value of customer relationships. These kind of organizations can focus their efforts on the most profitable customers and prospects, and design targeted marketing strategies to best reach them.

Information Quality

In summer 2005 scientists reported about a problem that was related to quality of information gathered from the satellites. They were collecting data at the equator and they had reported a stable and cooling trend of temperature. But the reality was something else and there was a pattern of global warming. It was because they had drifted off course and they were reporting daytime temperatures evaluations that were taken in fact at night. This simple example shows that how important trend discoveries can be unseen, unnoticed, misidentified or interpreted inaccurately if there are information quality problems anywhere in the information value chain.

There are different sources of error introduction that hamper the result of data mining and data analysis. The goal of this research is not to go in depth for each one, but it is to find that the cases that data mining does not effectively work for the organization and in other words bring value for it. Sometimes there are some mismatches in data, maybe because it has not properly, clearly and accurately defined. The characteristics of the real-work object also should be accurate and up to date. For example in case of changing the price of an item, updated prices should be considered also. For an analyst not only the validity of information is important, but also the accuracy of data is important too. One of the other reasons for failure could be wrong way of transforming data which could not be analyzed by data mining tool. And finally the way that analysis is presented and displayed is very important too. Here are some notes that should be considered with a short description and example for each one:

Correct, complete and clear information

Imagine there is a "Salary" attribute in database with different kinds of data like: "2000$","Less than 1000$", "1500£" or "Not specified". This shows that data format plays a very important role when in gathering data. Also there should be some kind of trainings for information producers.

Exact measurement to avoid data collection errors

The previous example about the satellite and global warming is measurement error. To avoid periodically there should be verification of measurement devices and check or maybe calibrate them.

Missing or inaccurate values

Sometimes there because of data collection errors we have missing values. Missing values also occur because of incomplete customer responses and many other reasons.

Value synonyms

Imagine that it has used two different values like "Doz" and "Dz" because data does not have standardized value set. It shows value synonyms that cause data mining failure.

Concurrency

The age of data is represented by concurrency. In data mining for different goals we could need different ages of information. For example, recruiting rules of a company or insurance rules of an organizations have different changes in time periods. So maybe you cannot use the rules of 15 years ago against todays data or past 4 years for instance.

Outliers and anomalies

The values that do not fit the expected value or expected set of values or range of valid values are called outliers.

Mapping categorical data to numeric values

In cases that there are categorical data it is better to map them to numeric values. In trend analysis the relative relationship of categorical data or attribute codes are not easy to interpret. Because we need to interpret them for correlation and in this case the numeric values are much better than alphabetical values.

Modelling errors (Correlated attributes)

There should not be redundant data that tells the same information. For example in a database the use of "age" and "date of birth" or "gender" and "personal title" causes redundancy.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now