Problem In Collaborative Recommender Systems

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract

To develop a recommender system, the collaborative filtering is the most well-known approach, which considers the ratings of users who have similar rating profiles or rating patterns. Consistently, it is able to compute the similarity of users when there are enough ratings expressed by users. Therefore, a major challenge of the collaborative filtering approach can be how to make recommendations for a new user which is called cold-start user problem. To solve this problem, there have been a few efficient methods by asking to rate technique. Consequently, the profile of a new user is made by integrating information from a quick interview in which selected items are rated by presenting to the user. This paper provides an overview of the efficient methods based on the asking to rate technique. Reasonably, they are categorized into static and dynamic methods, and then, they are explained and compared.

Keyword: Recommender systems, Collaborative filtering, New user, User cold start.

Introduction

The artificial Intelligence systems have a remarkable impact on users, helping them to manage a massive amount of data over the Internet. The idea of personalizing searching engines, intelligent software agents and recommender systems are taken into consideration by users who ask for help in sorting, classifying, personalizing, filtering and sharing a large amount of information. One of the common recommender techniques is a Collaborative Filtering (CF) [1-3][Adomavicius, 2005 #63][Adomavicius, 2005 #63] which offers preferred items to a user based on the items previously rated by their collaboration. The essential supposition is that, if users X and Y assign a similar rate to n items or have similar behavior, will rate or behave other items similarly [4]. Therefore, a major challenge of CF technique can be how to make recommendations for a new user who has recently entered into the system which is called cold-start user problem. In other words, the system must attempt to gather information about the new user, before being able to fully use the system.

To solve the cold-start user problem, there have been proposed a few efficient methods based on ask to rate technique [5] in which new user is asked to rate the selected items until having a sufficient number of rated items. The methods can be categorized into two static and dynamic methods, If presented items are the same to "all" new users or not. In this paper, both static and dynamics methods are explained and their efficient methods are reviewed.

The rest of this paper is organized as follows. The recommender systems are introduced in Section 2. The concept of CF recommender systems is described in Section 3. A comprehensive survey of asking to rate technique and some of the efficient methods are discussed in Section 4 . Finally, Section 5 is to discuss related methods and the conclusion of this work.

Recommender systems

Recommender systems are a subset of information filtering systems which are used as efficient tools to overcome the information overloading, inspecting a large set of information and selecting information related to each user. The issue of recommendation and rating prediction imply items like movie, music, book and etc. or social factors like people or groups that yet have not been seen by users. When the recommender systems are able to predict ratings for items that have not been seen yet, the item(s) can be recommended to target user. Target user is a user that recommendations are made for her. A movie recommender system, for example, might memorizes explicit or implicit user ratings to recommend new movies to the same user based on ones that she has already seen.

Thus, how would be recommendation produce, there are taxonomy provided by [6] with five different techniques, including: Collaborative Filtering , Content-based, Demographic [7, 8], Utility-based [9] and Knowledge-based techniques. There is another category to overcome with limitations of mentioned methods with combining techniques tries to use the advantages of one technique to fix the disadvantages of the other. Several ways have been proposed for combining them to Come up with a new hybrid system (see [6][Adomavicius, 2005 #63] for the precise descriptions, where seven categories of hybrid system are presented). We describe CF systems, since repeating the detailed explanation of the other categories in this paper might be redundant, interested authors are referred to the original articles [1, 6, 10].

Collaborative filtering recommender systems

Collaborative filtering recommender systems are one of the biggest sub-domains of information retrieval. The basic concentration of these systems is to finds users with similar interests to the active user and aggregating their opinions to make a recommendation. So, it calculates similarity between users instead of calculating the similarity between content of items. Under the existing amount of information, both users and web site owners receive benefit from CF systems; due to that, users are able to Come across preferred items, moreover the profit from e-commerce web sites go up potentially, because it persuades the user to buy more related products or accessories.

In a typical CF scenario, input data shown in Error: Reference source not found is a list of m user, U= {u1, u2, …, um}, and n items, I= {a1, a2,…,an}. Each row vector is in accord with a user profile and represents a particular user’s item rating; and each column vector corresponds to a specific item’s ratings by all m users. The ratings may be collected explicitly, according to numeric ratings (such as 1 to 5 star scale) or binary ratings (such as like- dislike and so on), or derive implicitly through user’s actions (such as analyzing spent time or mining hyperlinks and so on) [11]. They usually are stored in a mn user-item rating matrix named matrix A, such that Aij represents the preferences score (rating) which ith user interest to jth item, and a particular score (empty, Ф or zero) if ith user did not rate the jth item as yet. The task of a CF system is to find suitable items according to target user’s preferences through either Prediction probability on a target item’s rating for active user or Recommending a List of items(Top-N recommendation) to an active user [3].

Researchers have already classified many algorithms for collaborative recommendation, including the Memory-based or Model-based CF [1, 2], also for taking advantages and alleviating certain drawbacks of two algorithms, some studies suggested hybrid algorithms[6, 12]. This section focuses on a common memory-based CF algorithm, named User-based kNN (k-Nearest Neighbors).

Memory-based algorithms essentially are heuristics, as in the User-based kNN [13], system calculate utility prediction of a target item based on statistical techniques in order to find others with similar tastes.

First, the similarity, between target user, , and all other users, , who has rated target item, at, are computed by different measures such as Pearson’s Correlation (shown in equationError: Reference source not found) ,Cosine measure, a recent measure like Proximity–Impact–Popularity [14] and so on; which reflects distance, correlation, or the weight between two users.

Where is the rating of the item m by user u is, is the average rating of user for all the co-rated items, and h is the number of items co-rated by both users. The similarity ranging is between -1 and 1.

Second: Once similarities are ready, prediction for a target item by target user can be calculated using at most k nearest neighbors, who have also rated target item, found from former step as equation Error: Reference source not found.

Where and are the average ratings for the target user and user i on all other rated items, and is the similarity between the target user and user i. Generally, approaches are different for the way that "ratings", "k" and "similarly" are defined. A simple example in Error: Reference source not found, using the User-based CF algorithm, the prediction of rating for by is as following:

One of the advantages of memory-based CF algorithms is its intuitive idea that made it easy to comprehend and the results are conveniently explainable. Furthermore, the main strength of pure CF systems is that, the new data can be added increasingly and without difficulty, since they do not require any tagging of the items’ content, like content-based filtering, and recommendations are only using the ratings data; Hence, this approach is suitable for any domain, especially in domains that contents are either rare (like, restaurants) or accruing contents are difficult (like, movies or music). Collaborative systems have their own limitations like, Cold-start problem[5, 15-17], Scalability[18] and Sparsity[4].As an important problem, the Cold-start user problem occurs when a user, who is new to the recommender system, enters into the system and there are no ratings by the user. The user-based CF could not compute similarity between new user and other users [15-17, 19-21]. Hence, it is difficult to make accurate recommendations.

To solve this problem there have been different techniques. The Ask to rate technique [5, 15, 17, 19-21] is the most direct way to obtain some information about the new user and learn the user’s preferences. The next section is explained the asking to rate technique.

Ask for explicit ratings

The most direct way to cope with user cold start problem and make a rapid profile of new user is to ask for explicit ratings by presenting items to the user. It can elicit initial information about the new user with a quick and short interview. After presenting some items to the new user, this process is completed and whereas in user-item matrix the row of new user is not empty, the new user enters into the normal phase of recommender system. The CF system should use these ratings to compute similarity between new user and other users. Whereby, she gets precise recommended items, shown in Error: Reference source not found .

The system must be cautious about presenting informative items that gather useful information before a new user is allowed to use the system normally. If the ratings are obtained with a well-designed selection strategy compared with a strategy where the users self-select the items to rate, the recommendation accuracy can be improved.

Generally, techniques should not appear severe to the new users, and they must move toward minimizing user effort and maximizing recommendation accuracy. Of interest in [15, 19] the Evaluation of elicitation methods on user effort and accuracy metrics are shown in Error: Reference source not found . The methods are mentioned in the following section. This paper provides an overview of the efficient methods based on the asking to rate technique. Reasonably, they are categorized into static and dynamic methods based on how the next items are selected.

4.1 Static Methods

Using static techniques makes it possible to present the same items to "all" new users, regardless of changes in knowledge of the user being interviewed. In most of these methods computation is based on information theory for the new user’s problem. The advantage of these methods is that the order only needs to be calculated once. Various strategies have been proposed for a static methods, such as Entropy and Variance strategies [20]; Random, Popularity, Pure entropy and Balanced strategies [15] ; Entropy0 and HELF strategies [19]. The details of each static strategy are:

Active WebMuseum is the first CF recommender system which uses ask to rate technique [20]. This web-based virtual museum has a dynamic topology, which art paintings are personalized and ordered by museum visitor’s taste and preferences. This paper proposes Entropy and Variance methods to present the sequence of items in order to be rated by new users. These methods are a statistical analysis of the item’s ratings distribution given by other users in the dataset. For example, a given item can be presented with the items of the highest entropy score. Experiments use Random strategy (Select items to present without prior planning) as a baseline measure and point out that these two methods generate more accurate predictions for new users than Random strategy.

MovieLens research group in 2002 extended the aforesaid idea in web personalization [15]. In this research proposes some strategies which contain the use of information theory, aggregate statistics, balanced and personalized techniques to learn about new users. These strategies focus on the issue of which items should be presented to the new user during an initial interview. Different strategies have been tested through offline and online experiments to select movies that used MovieLens data set. Their evaluation considered user effort and Recommendation accuracy related to the user experience. All of the proposed methods have been measured based on rating prediction accuracy- MAE (Mean Absolute Error) evaluation metric. Methods are as the following:

Random strategy: selects the items randomly. The analysis of the rating matrix is not intelligent. The results of online and offline experiments point out that, It needs very much more user effort and the accuracy of predictions is unfavorable, but learns about new user preferences in terms of the all available items. Random strategy is a baseline strategy used for comparison.

Popularity strategy: It has been suggested to take an item’s popularity into account, i.e. how many users have rated an item. Items are ordered by the number ratings that they have been given by all users and present some of the most popular items to the new user. According to the equation Error: Reference source not found the popularity of item is computed, where shows the rating of the item.

The implementation of this method is easy and inexpensive to compute. It has accomplished the important goal of minimizing user effort. However, ratings may be uninformative, since most of users like popular items. Moreover, the popularity measure suffers from prefix bias- it is derived from popular items when getting ratings increasingly but unpopular items not. This problem causes unequal distribution of ratings in the dataset.

Pure entropy: Another low complexity method for item selection is entropy, which was proposed by [20] and was re-presented in [15] . The entropy on a target item is the dispersion of the item ratings in the rating matrix. Using equationError: Reference source not found:

Where, pi is the experimental probability of at's ratings that equals to i (i is similar to possible ratings, here 1 to 5). Then some not-yet-rated items with the highest score will be presented. This method provides a lot of information for each rating, but some information is not informative for system and sometimes gives "garbage" (pointless or useless) patterns, since this method does not take frequency into account. In offline experiment, Entropy like Random strategy need extremely user effort and performed extremely poorly on accuracy. It was not evaluated in the online experiment. Hence, in terms of accuracy and user effort, Entropy and Random method lags behind all methods mentioned in [15].

Balanced strategy: Producing the logarithmic of possibility that the user has rated the item (popularity score) and entropy, that is (Log Popularity) *Entropy, and some items are presented in descending order. This method combines advantages of two components. It makes best accuracy and need medial user effort.

In 2008 the idea of asking to rate by MovieLens research group has been further extended in [19], for improving the order of items and eliciting opinion of new users at registration time more precisely. This paper was a winner of the Yahoo! Research Best Paper Award [22]. They propose an offline simulation framework and an online experiment with real users of the MovieLens live recommender system .Three new information theoretic strategies presented: Entropy0, HELF and IGCN. The details of the two static strategies are:

Entropy0 (Entropy Considering Missing Values): In [15] missing ratings (non-ratings) ignore in the entropy’s calculation. Entropy0 have been proposed to be handled problem of an item with the missing evaluations as all missing evaluations fill with a separate category like "0", whereas "1-5" is the usual scale; and then a weighted entropy formulation is used as an equation Error: Reference source not found:

Where w0 = 0.5 is the weight to identify missing values, and wi = 1 for i = 1, 2,3,4,5, since this selection of weights provided the best results for the original experimentation. Note that w0=0 change Entropy0 into the pure entropy measure. The Entropy0 method performed extremely well on both dimensions of of the new user problem. It is slightly successful than Popularity method.

HELF (Harmonic mean of Entropy and Logarithm of Frequency): This strategy is a hybrid of Popularity and Entropy, such as Balanced strategy [15]. HLEF combines harmonic mean Popularity (rating-frequency of items) and Entropy scores of items, but the combined measures are not correlated. HELF is defined as an equationError: Reference source not found :

Where, is the normalized logarithm of the rating frequency of target item, and is the normalized entropy of target item. the harmonic mean strongly go up and down when both of the components increase or decrease.

4.2 Dynamic methods

We define those approaches Dynamic that selected items are consistent with "each" new user. For present items that best fit user’s personal preferences, the system should adapt to the earlier rates given by the new user. Thereupon, they rate items with personalized orderings and the interview process will be controlled more effectively than static approaches. Dynamic approaches take into account the user’s historical ratings among initial interview and consider the system’s changing profile of the new user; thereby, the number of items familiar to the new user is maximized. Dealing with the cold-start user problem by asking to rate, there are a few Dynamic approaches for a personalized items’ ordering, such as Item-Item personalized[15]; IGCN [19]. The details of each strategy are:

Personalized: Item-Item personalized, where the items are proposed until the user gives at least one rating, then the similarity between items will be computed using a recommender and some items that the user would be most likely to buy (or see in movie domain) will be presented. Whenever the user gives more ratings the list of similar movies will be updated. Evaluation of strategy over both experiments on their metrics point out that it provides the best user effort like Popularity and Entropy0 strategies and the worst accuracy like Entropy and Random strategies, since the approach tends not to identify items that user will like [15].

IGCN (Information Gain through Clustered Neighbors): Toward achieving an adaptive designed selection strategy, at first [19] considers using decision trees. Initially the users will be clustered into groups, and then a decision tree algorithm such as ID3 will be used to come across the right cluster for the target user and learn user profile. This approach takes into account the items rated so far by a user. The goal of target user is to follow a route through the decision tree from the root node (with the highest information gain) to the leaf node (which infers the user's true class or neighborhood).

However, the authors refuse to consider this ideal decision tree scenario because it may not be practically feasible with most members of a recommender system, and instead have proposed a two-phase algorithm named IGCN. Before starting the first step, user clusters will be created using bisecting k-mean approach and items’ information gain (IG) will be computed. In the First phase, named non-personalized step, the user give few ratings to the items which are ordered by their information gain scores to build an initial profile, until the user has rated at least some threshold number of items. In the Second phase, named personalized step, toward creating a wealthier profile, the information gain of the items computed only using the best neighbors of the target user, as long as the best neighbors have no changes. IGCN requires assuming a predefined clustering of the users. The online experiment for 20 days performed on 468 users presented that IGCN approach, offers greater accuracy than all other proposed information theoretic measures.

A brief comparison of the classification of methods to alleviate user cold start problem and their advantages and disadvantages is depicted in Error: Reference source not found.

Conclusion

In summary, the objective of a recommender system typically is to recommend items that best fit user’s personal preferences. Collaborative filtering systems generate recommendation based on user-user similarity. A new user encounters a serious problem in the collaborative filtering approach. Since the system does not have any data about the new user preferences, it could not provide any personalized recommendation for her. It has to acquire some data about the new. In this paper we have reviewed several methods for dealing with the new user problem via ask to rate technique. The methods are categorized into two static and dynamic categories.

Although there have been proposed a few efficient methods to solve the cold start new user problem, it is not still a stone in the corner. During the items selection, the new incoming ratings of other users are not considered. Therefore, a future direction can be developing a new method, which adapt to the earlier ratings given by other users.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now