Recommendation Using Linear Regression Computer Science Essay

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

1.Gourav Jain, 2.Nischol Mishra,3. Sanjeev Sharma

School of Information Technology, Rajiv Gandhi proudyogiki Vishwavidyalaya,

Bhopal,M.P.,India,462036.

[email protected]

2. [email protected]

[email protected]

.

Abstract:

A system which suggest list of most popular items to a set of user on the basis of their interest named recommendation system. Recommendation system filter the unnecessary information by applying knowledge discovery techniques, for online users and become the most powerful and admired tools in E-Business. ERPM is one of the movie recommendation techniques used for recommendation, which overcome the limitation of scalability and sparsity of recommendation system. ERPM is one of the easiest methods used for recommendation but prediction generated by ERPM method based on probability model are less accurate and taking more time to calculate. To overcome this problem a novel method based on linear regression model is proposed which improves the prediction accuracy along with speed, named CRLRM (Category based Recommendation using Linear Regression Model). Performance of our method is evaluated by MAE and show 30-40% improvement in number of rating out of 100000 rating.

Keywords: ERPM, Regression model, MAE, Recommendation system.

1. Introduction:

With the increase in the amount of information across the world it is necessary to process these data, more quickly in the exigent environment. For processing the data in the real world scenario two terms, data mining and recommendation system is plays a key role. Data mining is defined as the process of mining the unnecessary information and summarized into useful form while recommendation system is the suggesting those information to user and incorporate the data mining techniques like clustering, association rule which helps in decision making. So we can say that recommendation system is one which process enormous range of these data and suggest useful and interesting data among all data, to a set of users on the basis of user interest.

One of the major applications of recommendation system is in online shopping where user find’s an item which he/she want to purchase. An item has many features and it is responsibility of recommendation system to suggest the useful item to a user on the basis of their feature requirements. A recommendation system is prominent part of networking environment and applies in many different areas like tag recommendation, music recommendation, item recommendation and movie recommendation etc. among them movie recommendation is the one of the valuable area in recommendation.

Jian et.al[7] proposed a ERPM algorithm for the recommendation of movie named ERPM. In ERPM, the techniques used for recommendation numerate the prediction based on probability model. However ERPM provide the easiest way for computing the prediction form whole databases but has a problem of poor prediction. To overcome this problem a novel method based on regression model named CRLRM is proposed. Instead of computing the prediction from whole database the proposed algorithm computes the prediction from the movie category.

Proposed algorithm calculate the rating given by the user for a movie, and check the difference between the rating generate by proposed CRLRM and existing ERRM method from actual rating (present in databases) individually. Minimization in difference show that proposed method predicts more accurately and enhances the performance of recommendation system.

Remainder of section described as follows section 2 described the related work ,section 3 describe the Regression based recommendation approaches, section 4 described the experimental result and performance study and last section 5 concluded the paper.

2. Related work:

In this section we briefly describe some of the research literature related to collaborative filtering, and regression model, random walk, recommendation system. Collaborative filtering is the fundamental method used for recommendation system and applied in various domains, in which similarity between user and item is determined, and recommending the item to similar user. One of the primary recommendation system based on collaborative filtering algorithm proposed by tapestry[8] Whose methods were better for small organisation where people know each other but for large organisation or community it is not possible to known every person very well. Numerous researches have been done in the field of recommendation system using collaborative filtering and there were some improvement from previously implemented collaborative filtering recommendation algorithm.

Clustering is one of the techniques used for recommendation system which create the cluster of user based on the similar user preferences. Xiang cui[2] proposed a method of collaborative filtering algorithm based on uncertain user interest ,in which cluster is made on based on uncertain features . For the user those belong to several clusters, prediction is evaluated by taking the average across the cluster based on degree of participitation. Author solved the uncertain features with the help of clustering algorithm by computing the entropy between two clusters and gets stable class and overcome the limitation of traditional CF method on the basis of trustworthy degree of uncertain user interest.

Another method used for recommendation based on collaborative filtering method proposed by Yi cai[1] named as Typicality based collaborative filtering recommendation(TyCo) which overcome the problem of data sparsity, recommendation inaccuracy and big error in prediction by taking the ideas of object typicality from cognitive psychology. This method finds the neighbours of user based on user typicality degree in user group and Outperforms many CF based recommendation with improvement in accuracy and have lower time cost.

Many researcher gave emphasis on improving the accuracy of recommendation system but diversity is another aspect which was focused by Gediminas. et. al [3].In traditional recommender system relevant item was ranked in descending order of predicted rating for each user and recommend the highly ranked item with high accuracy. Gediminas. et. al [3] used these item popularity for increasing the diversity of recommendation system and technique used by author for diversity improvement is flexible, efficient and parameterized.

Along with the diversity of recommendation system, contextual information (like Time and location of recommendation, information about actor, director or writer etc) are one more important aspect of recommendation which was not discussed more. Toine Bogers [6] proposed a context walk algorithm based on random walk used for recommendation and solve the problems related to contextual information such as collection of context information and generating a computable formalization of contextual information. Author creates the contextual graph in which each node (context) is connected to other node (context) and applies random walk on them.

Apart from the diversity and contextual information in Recommendation, Noulas. et. al [5] proposed a method for ranking the venues based on random walk, which overcome the problem of working with mobility data in collaborative filtering method. To conquer this Random walk approach is used for establishing the relationship between check-in, social data and spatial data with others where item is connected by linked structure. Each item has transitional probability based on them random walk model select an item and stay on each node for different amount of time. The output of the random walk model is a steady state probability and according to the decreasing order of steady state probabilities item is ranked.

Recommendation system is growing rapidly in social networking and tag recommendation is one of the applications of recommendation system. Tag recommendation mainly helps in searching the topics, and other tag related to the topic can be recommended for new resource. Main task of recommendation system is to retrieve the useful information so Jun. et. al. [4] proposed a method retrieving information in easy and suitable way named personalised tag recommendation. Author firstly create the network than proposed a topology ,based on tagging history and latent personalised preference and recommend the tag for those user’s who have most influenced on other user. Result is better than the non-personalized global co-occurrence method even when experiment is performed on large scale real world data.

Jian. et.al [7] overcome the limitation of collaborative filtering method like scalability and sparsity of recommendation and proposed a method ERPM based on probability model. ERPM directly compute the predicted rating by find out the probability of user of watching the movie for each rating and similarly find out the probability of movie gets rating by users. MAE (Mean Absolute Error) is calculated of both the method and lower value of MAE show that ERPM method is better than the traditional collaborative filtering method.

To overcome the poor prediction accuracy by probability model, Amjad Almahairi proposed a project named Regression model for movie rating prediction based on regression model for increasing the prediction accuracy. Regression model compute the value based on the one independent variable and one dependent variable. Author compare the predicted rating from the support vector machine neural network and show some improvement but still there is problem of poor prediction and low speed prediction.

3.Regression based recommendation approaches:

3.1 Data modelling and representation:

Typical recommendation system which provide the E Business facility, contain the list of m users represented by set [u1,u2,u3…..un] .user select a item from the list of m items represented by a set[i1,i2,i3….im] and relationship between user item is represented by a n×m matrix in Table 1. Entry of matrix is the rating given by the user uiєU for items ijєI and represented as ri,j means rating given by user i on the item j.

i1

i2

……

ij

…

im

u1

r1,1

……

……

r1,j

……

r1,m

u2

r2,1

r2,2

……

……

……

…

……

……

……

……

……

……

ui

ri,1

……

……

ri,j

……

ri,m

…

……

……

……

……

……

……

un

rn,1

r2,n

……

……

……

rn,m

Table 1: Representation of user item matrix.

3.2 What is regression model?

A model that has both deterministic as well as probabilistic components called regression model. in deterministic model With the help of one variable, value of other variable can be predicted and represented by y=f(x) which means value of y is determined based on x .The model is called deterministic because value of y is totally depend upon the value of x, but in real life there is less chance of determining y totally based on x, hence we use probabilistic model. Probabilistic model or probability model are used to predict the value of variable on the basis of previous information and represented by Y~p(y) where Y is randomly generate from the probability distribution p(y) ,but probability does not exactly tell what the value of Y will be. So ,for increasing the prediction accuracy combine the feature of both the model (deterministic and probability), that gives regression model. Like deterministic model, Regression model are also predict the value of one variable based on other variable and represented by Y ~ p(y|x), where Y is generate at random from the probability distribution for known x. For constructing a regression model value of x and y is taken from the sample of object and comparing from the other model, regression model takes less time and/or money for retrieving the information for computing the prediction.

3.3 Problem Description:

Main goal of recommendation system is, suggest the available item to a listed user by user interest and generate the high quality prediction along with high speed for an active user-movie. Instead of calculating the predicate rating from whole databases in ERPM, we evaluate the rating from the category of movie using linear regression model. Predicted rating for each user movie pair is calculated by finding the probability of each user and movie individually and manipulating them with the help of regression value.

A movie is watched by number of users and gives rating to a movie between 1 for those movie which he does like little bit and 5 for those which he does like more. Quality of movie is depend upon the no of user and cannot be evaluated by the single user.

Movie m is rated by Un different users and the no of user who rates the movie m by rating 1 is represented as Q1 .Q2 is the no of user who rates the movie m by rating 2 and so on. Probability of movie corresponding to rating Pm,r is calculated by

=

P(m,r)………(1)

Where Nm=Q1+Q2+Q3……..+. Where r is the rating and m is the movie no. Similar to find out the probability of movie, now we calculate the probability of user, by counting the no of movie of category c gets rating r given by user. User Ud rated Nd different movie and Sc,1= no of movie rated 1 by user u of category c. Sc,2= no of movie rated 2 by user u of category c and so on. Probability of user is calculated by a movie category as

=

P(d,c,r) = ................(2)

Where Nd=Sc,1+Sc,2+Sc.3+………………+.rmax the highest rating given by user d for movie m and c is the category of movie.

3.4 Recommendation Generation

CRLRM method computes the prediction on the basis of movie category while ERPM method computes the prediction on whole databases.

Predicted rating is calculated by :

=

Where is defined as the probability of user d gives rating r on category c of movie m and calculated as

=[Pm,r×Pd,c,r][Regression ] ...................(4)

Regression value are used for prediction and value of regression is calculated by the equation:

Regression (y’) = a+bx ………(5)

Regression model find the value of regression (y’) with the help of one independent variable and one dependent variable from equation (5). Where x is the independent variable, y is dependent variable (not showing in equation but used for computing the value of y’) and y’ is the regression value computed with the help of x and y. In the database the value of x is the no of user for each category and value of y is the ratio of the summation of rating given by user for one category of movie to all category.

a and b are the two parameter based on x and y and calculate as follows.

b= ……..(6)

a = -b ............…....(7)

Where and are the mean of x and y respectively and calculated by equation (8) and (9) respectively.

= ..................(8)

= ...................(9)

b is calculating by the formula described above and put value of b into equation (7) that gives a.

Fig1: presentation of no of user in each category.

With the help of value of x is showing in fig1 for each category and value of y regression value is computed by which rating is predicted for each user movie pair.

3.5 Compare with ERPM method CRLRM model have following advantages

The computed predictions on the basis of regression model in CRLRM method gives better prediction accuracy and increase the speed of recommendation. The reason behind that is that it computes the predicted values for a movie from the category of movie rather than computed the value from whole databases. Second important advantage is that CRLRM removes the dependency upon any parameter (α and β in ERPM) and calculate the value of prediction by mathematically equation .hence predictions are improved because in ERPM, a change in value of any parameter will changes the result. From comparing from other model CRLRM model takes less time and money for retrieving information for prediction.

4. Experimental result and performance study:

4.1 Description of dataset used:

We used the MovieLens data set which was collected by the group lens research project by the University of Minnesota. MovieLens data set has 943 User and 1682 movies and 100000 rating, by considering only those users who rated at least 20 movies out of 50000 user and more than 3000 different movie. Dataset was converted into user movie matrix which has 943 rows and 1682 column and Entry in matrix is the continuous rating given by 943 users to 1682 movie between 1(for bad movies) and 5(for good movies).

Dataset contain the information about the users, movies and rating in user, movie and data files of dataset respectively. User files contain the information about the user id, age, gender, occupation and zip code while Movie files contain the movie id, release date, movie title, imdb URL and list of genre and data file contain the user id ,mid ,rating and timestamp, among them the uid from the user file , mid, and movie genre from movie file and uid, mid, rating from data files are useful for our work and other will be ignored. Movie is classified into 19 genre which are used for computing the regression value. Experiment are performed on windows 7,4GB RAM of main memory, Core i3 processor and jdk 1.7 of java.

4.2 Quality Evaluation metrics:

Quality of recommendation system can be evaluated by several types of measure wherein Mean Absolute Error (MAE) is one of the popular methods among them and used to find out the error between actual rating and predicted rating for each user movie pair.

MAE=

Actual rating is denoted by the rm , predicted rating is denoted by the prm. and N is the total no of items. For better prediction by proposed recommendation algorithm, MAE value should be low.

4.3 Quality Experiment:

We compare the CRLRM method with the ERPM method on the data of 100 user and 1682 movies and find out the MAE (Mean Absolute Error) value for both CRLRM and ERPM method.

Method

Parameter

ERPM

CRLRM

MAE

3.364328638

2.035591765

Table2: comparison of MAE value for 100 users and 1682 movies.

The Lower value of MAE (Mean Absolute Error) means less error in prediction and Table 2 show that CRLRM method predicts more accurately. For better prediction the difference between the actual and CRLRM should be low from the difference between actual and ERPM. Table 3 present the sum of rating for each 20 users.

User’s

Sum of Rating

Actual

CRLRM

ERPM

20

3582

4111.04

4669.5607

40

1452

1630.97

1795.074

60

2381

2769.52

3151.2033

80

1745

2072.56

2429.1058

100

3171

3698.88

4255.2204

Table 3: summation of rating of 100 user’s for group of 20 users

Table3 shows that CRLRM method predicts more efficiently than ERPM method and result show that CRLRM Method improves 30-40% prediction accuracy in prediction over 11000 rating for 100 users.

A graph is made between users and sum of rating where Users taking on x axis and sum of rating is taken on y axis.

Fig 2: Comparision of accuracy prediction by CRLRM and ERPM.

Graph show that the CRLRM method predicted more accurate than the ERPM method. Graph is made for 100 users and 1682 movies.

5. Conclusion:

Recommendation system is computer based intelligence technology that helps users to finding the interesting product from the list of available item on the basis of user demand. ERPM method used for movie recommendation have problem of poor prediction and take more time for recommendation. To overcome the problem in ERPM, in this paper we proposed a novel method CRLRM (Caregory based Recommendation using Linear Regression Model) which predict the rating from category of movie using regression model with dynamic data changes. CRLRM method show better prediction accuracy and speed up the recommendation process with experimental proof.

Refrences:

[1]. Yi Cai, Ho-fung Leung, Qing Li, Huaqing Min,Jie tang and Juanzi Li, "Typicality-based Collaborative Filtering Recommendation",IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING,Jan 2013.

[2].Xiang Cui,Guisheng Yin," Method of collaborative filtering based on uncertain user interests cluster",JOURNAL OF COMPUTERS ,VOL.8,NO.1,JANUARY 2013,PP186-193.

[3] Gediminas Adomavicius, YoungOk Kwon," Improving Aggregate Recommendation.Diversity Using Ranking-Based Techniques", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 5, MAY 2012.

[4] Jun Hu , Bing Wang, Yu Liu ,De-Yi Li," Personalized Tag Recommendation Using Social Influence ",JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 527-540 May 2012.

[5]. Anastation Noulas, Salvatore Seellaoto,Neal Lathia,Cecilia Mascolo,"Random Walk Around the City :New Venue Recommendation in Location-Based Social Networks", 2012.

[6]. Toine Bogers, "Movie Recommendation uses Random Walk Over Contextual Graph", 2010.

[7]. Jian Chen ,Jin Huang, Huaqing Min, "Easy Recommendation Based on Probability Model",IEEE,2008,pp 441-444.

[8]. David Goldberg, David Nichols, Brian M. Oki and Douglas Terry, "using collaborative filtering to weave an information tapestry", communication of the ACM,Dec 1992.

[9]Macro Gori,Augusto Pucci,"ItemRank: A Random-Walk Based Scoring Algorithm for recommender Engines",IJCAI,2007.

[10] Amjad Almahairi,"Regression Model for Movie Ratings Prediction",McGill University School of Computer Science,Dec,2009.

[11] Sangkil Moon,Paul K.Bergey and Dawn Iacobucci,"Dynamic Effect of Movie Rating on Movie Revenues and Viewer Satisfaction",june 2009.

[12] Badrul Sarwar,George Karpis,Joseph Konstan and John Riedl,"Item based Collaborative filtering Recommendation Algorithm",ACM,2001.

[13]www.movielens.org

[14]http://www.psychstat.missouristate.edu/introbook/sbk16.htm

[15]http://courses.ttu.edu/isqs5349-westfall/images/5349/deterministic_stochastic.htm



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now