The History Of Data Mining

Published Date: 02 Nov 2017

Nowadays, data mining plays an important role in finance, purchasing, marketing and sales. In the context of forecasting, there are many aspects which decision makers can benefit. Data mining establishes an environment that is based on forecasting which can be used in many areas. It also offers strategic plans that can affect profitability. In the recent years, traditional processes have given way to modern time series based methods. This essay will explain clearly the concept of data mining for forecasting. Then it will discuss why industry needs data mining, what the advantages of data mining for forecasting are, what the limitations of classical univariate forecasting are and what are the prerequsities that need to be considered in the successful implementation of a data mining for a time series approach and finally it will explain the process and methods of data mining for forecasting.

"Traditional data mining processes, methods and technology oriented to static-type data has grown immensely in the last quarter century [such as] Fayyad(1996), Cabena(1998), Berry(2000), Pyle(2003), Duling, Tohmpson(2005), Rey, Calos(2005), Kurgan, Musilek(2006), Han, Kamber(2012)" (Rey and Wells, p.35) Traditional data mining methods lost its charm when new time-series techniques has come up. These people contributed to data mining methods and built the prediction models on data which does not consist of a time series approach. Time series based methods are useful when it is applied to data mining. The methods give more accurate results when considering inventory costs, revenue optimization and customer loyalty. It also helps the experts working for their own company.

"Data mining functionalities are used to specify the kind of patterns to be found in data mining tasks. In general, data mining tasks can be classified into two categories: descriptive and predictive. Descriptive mining tasks characterize the general properties of the data in the database. Predictive mining tasks perform inference on the current data in order to make predictions."(Han, Kamber, p.7) In each case, modeler uses data mining system that offers to mine various kinds of data.

"Three prerequisites need to be considered in the successful implementation of a data mining for a time series approach: 1. understanding the usefulness of forecasts at different time horizons, 2. differentiating planning and forecasting and 3. getting all stakeholders on the same page in forecast implementation" (Rey and Wells, p. 35) Traditional and time series data mining differs by time horizons. The time horizons are divided into three groups: short term forecasts are one to three years, medium term forecasts are three to five years and long term forecasts are five to ten years. Companies develop their own strategic plans and specify which time horizon they are going to use. For instance, strategy groups require medium to long range forecasts for strategic planning. Marketing and sales organizations are in need for short to medium range forecasts for planning purposes.(Rey and Wells, p. 35) These decisions are very crucial in terms of accuracy of forecasts. This is because a company may lose millions of dollars if the decisions made are wrong. Separating planning from forecasting is vital. Companies and the decision makers should direct their hopes toward a plan. Regarding plan as a forecast is misleading. "Plans are what we feel we can do, while forecasts are mathematical estimates of what is most likely" (Rey and Wells, p.35). They are different values; however the accuracy of the two values should exist together. Despite many groups within an organization have common forecasting needs; they do not use the same data. The incompatibility between these groups causes harm to the organization. It can be resulted as "rework and/or mismanagement" (Rey and Wells, p.35)

Big data in data mining for forecasting is crucial. In the recent years, there has been an increase in the number of time series based data. Services provide millions of historical time series data collected over time however do not provide forecasts of the data collected. SAP system takes an important place in collecting and managing data. The system offers solutions for businesses to get historical time series data for price, volume, costs. The independent variables which has a time series data are necessary to forecast the dependent variables as can be seen in multiple regression. Time series modeling is important in this case. Simple regression and correlation techniques mislead the business people because a time series variable is dependent on another one which ignores possible serial correlation. The other crucial point is the necessity of statistical forecasting. When building time series data for forecasting, it should be decomposed into seasonality, trend cycles and error terms. As mentioned above independent variables (Xâ€™s) are major components which builds the dependent variable (Y). Univariate forecasts are advantageous and more accurate in the short run. (Rey and Wells, p.36) However, in the long run, the error terms get quite bigger. Therefore it is not as effective as multivariate forecasts. "The use of exogenous variable forecasting not only manifests itself in potentially more accurate for price, demand, costs, etc., in the future, but it is also provides a basis for understanding the timing of changes in economic activity"(Rey and Wells, p. 36)

Various authors e.g Hand (1998), Glymor (1997) and Kantardzic(2011) defined the difference between data mining and classical statistical framework. Classical statistical framework is driven by these steps: "First a particular research objective is sought. These objectives are often driven by first principles or the physics of the problem. This objective is then specified in the form of a hypothesis; from there a particular statistical model is proposed, which then is reflected in a particular experimental design. These experimental designs make the ensuing analysis much easier in that the Xs are independent or orthogonal to one another. This orthogonality leads to perfect separation of the effects of the drivers. The data is then collected, the model is fit and all previously specified hypotheses are tested using specific statistical approaches. Thus very clean and specific cause and effect models can be built."(Rey and Wells, p. 37)

On the contrary, considering business, data may contain many X and Y variables; however, there are no specific objective and hypothesis elements. Therefore,"data have irrelevant and redundant candidate explanatory variables" (Rey and Wells, p. 37) which is called multicollinearity. In this respect, X variables are related to each other. In other words, correlations between the X variables are expected to be close to 1. In this sense, data mining is implemented by applying statistical and machine learning methods so that these methods look for specific X variables that predicts Y variables. For the time series data, the procedure is the same. While building multivariate time series, specific Xs which are often small sets are chosen. This is because; "Y" variable is forecasted using best X variables. "Rey and Calos(2005) review the data mining and modeling process used at The Dow Chemical Company. A common theme in all of these processes is that there are many Xs, and thus some methodology is necessary to reduce the number of Xs provided as input to the particular modeling method of choice."(Rey and Wells, p. 38)

Static data mining process and data mining for forecasting are similar to each other. Both approaches aim at reducing the number of variables. While reducing the number of Xs and Ys, two steps are implemented: variable reduction and variable selection. In variable-reduction step, the goal is to reduce the number of independent variables (Xs) as well as explaining the variability in the Xs. This step is often called "non-supervised method". Y variables do not exist in this step. The other step is variable selection step. In this step, Y variables are considered in contrast with variable reduction step. However, these methods may be problematic when data is processed without adding lags of the Xs. The interval of data obliges the solver to deal with more lags. For instance, if the data is monthly, the solver would have to include at least twelve lags for each X. Therefore, data explosion occurs in this case due to many variables. The lag issues in traditional data mining techniques lead up to new approaches. As stated in the text (Rey and Wells, p.38), two main approaches can be used for the variable reduction and/or variable selection: A similarity analysis approach is used to analyze and measure the similarity of multiple time series variables. In contrast to traditional methods that Y is dependent on various X variables, similarity analysis approach offers that data is ordered with respect to the distance between X and Y series and then by applying a variable clustering algorithm, redundancy on Xs and the number of Xs are reduced. The second approach is a co-integration approach. This approach is only used for variable selection. "Co-integration is a test of the economic theory that two variables move together in the long run"(Rey and Wells, p.38) Unlike traditional time series modeling for differencing Y and X sequences to make them stationary, the co-integrating regression model is suitable to see if the series is stationary.

Data Mining Techniques thoroughly acquaints you with the new generation of data mining tools and techniques and shows you how to use them to make better business decisions. Data mining for forecasting is crucial due to producing high quality forecasts. Lately, data mining is widely used in business areas. It offers various strategic plans that can affect the profitability and accuracy of business investments.

APPENDIX

Data mining: Â "is the computational process of discovering patterns in largeÂ data setsÂ involving methods at the intersection ofÂ artificial intelligence,Â machine learning,Â statistics, andÂ database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and,Â data preprocessing,Â modelÂ andÂ inferenceÂ considerations, interestingness metrics,Â complexityÂ considerations, post-processing of discovered structures,Â visualization, andÂ online updating."[1]

Forecasting:Â "is the process of making statements about events whose actual outcomes (typically) have not yet been observed. A commonplace example might beÂ estimationÂ of some variable of interest at some specified future date."[2]

Inventory ControlÂ "is the supervision of supply, storage and accessibility of items in order to ensure an adequate supply without excessive oversupply." [3]

Dependent and Independent Variables: "The "dependent variable" represents the output or effect, or is tested to see if it is the effect. The "independent variables" represent the inputs or causes, or are tested to see if they are the cause. Other variables may also be observed for various"Â [4]

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now