Personalization Using Query Log And Clickthrough Data Computer Science Essay

Published Date: 02 Nov 2017

Abstract

Personalization of a web page involves dynamically altering the contents of the web pages according to the preferences or interests of users for retrieving, relevant information for a given query. Although many personalization algorithms have been explored, it is still not certain whether personalization is effective every time on different queries for different users and under different searching scenarios. Most of the personalization approaches are based on the user query logs or user profiles. Personalized web search only using query logs may not be effective and not be according to a user's preferences. In this paper, we propose a novel re-ranking framework for personalized information retrieval with integration of query logs and clickthrough data. We also present a re-ranking approach by using query log and clickthrough data. The presented approach combines users search context and users browsing behavior resulting in effective information retrieval. Our framework derives an extended set of conceptual preferences for a user based on the extracted concepts from the search. Experiments results show that the framework can be highly effective for personalized web search and information.

Keywords: Search Engine, Personalization, Clickthrough Data, Information Extraction, Query Log

1. INTRODUCTION

The Internet has changed the way humans interact with information. The excess amount of information has made a difficult and time consuming task the human processing and filtering through the available information to find what human users are looking for. When users search for information, they would like to receive the most likely results to satisfy their query first. But, generally user describes their needs of information in very few keywords or in short phrases, which makes retrieval in a web scenario is much harder due to the large and dynamic contents on the web. It stirs large scale querying and browsing activity that is of more interest to users over a sizable window period, which is unusual relative to normal patterns of querying and browsing behavior. Many techniques and approaches have been investigated for discovering various web access patterns from web usage logs for web personalization [4][6][13][14][21].

Web usage mining analyzes the behavior of users according to the data recorded in search engine and Web site access logs. A large amount of search logs accumulated by web search engines in form of user clickthrough data. These logs typically contain user-submitted search queries, followed by the URL of Web pages, which are clicked by users in the corresponding search result page. Although these clicks do not reflect the exact relevance, they provide valuable indications to the usersâ€™ intention by associating a set of query terms with a set of web pages. If a user clicks on a web page link, it is likely that the web page link is relevant to the query, or at least related to some extent. Several applications have been proposed along this direction, such as term suggestion [3], query expansion [4], and clustering of query [5][10].

Current information extraction systems (or search engines) returns big list of matching results based on keyword. However, users typically view only top few result documents from the big list of results extracted by the search engines [2][4]. This demands to show the most relevant result documents to user on the top to improve user satisfaction. However, without user information need knowledge this task is difficult, because "relevance" of a document depends on his/her user and the individual query. A dilemma is observed for learning with preference, for example, query Q1 as "Rock" of user U1 , document D1 has higher preference than document D2 and, for the same query of user U2 , document D2 has higher preference than document D1. In such scenario ranking of D1 and D2 documents creates a dilemma. So, the major challenges for personalized search are modeling appropriate user context and learn a user model to improve search accuracy.

In this paper, we present a framework for personalization with integration of query logs and clickthrough data. Our approach finds most relevant documents for a user based on a given query. Here, our focus is to evaluate the effective association between User Queries and Clickthrough data and customizes search results according to each individual preferences/interests. We also present a re-ranking procedure to score the web pages that are retrieved using approach. The paper organized in following sections as, Section 2 presents the related work on personalized web search. In section 3, we present the proposed framework and re-ranking procedure for personalized information retrieval that combines usersâ€™ context and browsing behavior. Section 4 describes the experimental results and section 5 concludes the paper.

2. RELATED WORK

Personalized web search need to customize the content and the structure of a Web site to present the most relevant web resources to the specific and individual needs. This can be achieved by taking advantage of the userâ€™s navigational behavior, as it can be revealed through the processing of the Web usage logs, as well as the userâ€™s characteristics and interests. There have been several prior attempts on personalizing Web search which focuses on personalized search strategies.

2.1 Query Logs

Query logs are auto saved data of user activities on search engines servers. It consists of user identity attributes as Session ID, IP address, Time-stamp, Query string, Number of results on results page and Results page number. A relevance clickthrough data also saved consisting of clicked URL, associated query, position on results page and Time-stamp attributes in the log. Query logs can also captured on the client side i.e., on the userâ€™s computers. For user personalization clickthrough and Query logs can be most important source. There has been some work related are described below.

Sugiyama et. al [2] used web browsing history in past N days for personalized search. They partition the browsing history data into three categories according to the time stamp, i.e., before today, today data but before the current session and current session data. They found that the performance of using web browsing history is competitive with that using relevance feedback. Speretta et. al [1] also used users search history to construct user profiles. Several other works [7][8][9] [11] have made use of past queries mined from the query logs to help the user personalization.

2.2 Language Modeling Approaches

Language modeling based approach for information retrieval has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. This approach represents a structural representation that describes the main topics and their organization within a given domain of discourse. Modeling language structure is particularly relevant for domains that exhibit recurrent patterns in content organization, such as news and encyclopedia articles. Computing language models for an arbitrary domain is a challenging task due to the lack of explicit, unambiguous structural markers. Moreover, texts within the same domain may exhibit some variability in topic selection and ordering this variability further complicates the discovery of language structure.

Applied to information retrieval and language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated or more simply, content modeling refers to the task of estimating a probability distribution that captures statistical regularities of the contents used. There has been a lot of work [12] are related to language modeling for information retrieval and related applications. There has not been much work in to explicitly capture information about the user and context of the information retrieval process in spite of the progress made in language modeling and information retrieval.

Shen et. al [16] proposed a decision theoretic framework for implicit user modeling for personalized search. They consider the short term context in modeling user. A language model is computed from the short term history and is used to improve the retrieval performance. In [17], long term search history of the users is mined and language models are computed which are then used to improve retrieval performance.

2.3 Collaborative Filtering Based Approaches

The growing popularity of social network demands an improved web search to group the similar interest users. A huge amount of history data are logged in history of community sites on daily users interactions. An exploitation of query repetition among the users within the community is needed. Similar queries and their corresponding clicked documents are used to either recommend documents or are used in improving search results. Different approaches have been proposed varying the way communities are defined and employing different learning techniques for discovering communities, user and community profiles.

Chidlovski et. al [18] describes the architecture of a system performing collaborative re-ranking of search results. The user and community profiles are built from the documents marked as relevant by the user or community respectively. These profiles essentially contain the terms and their appropriate weights. Re-ranking of the search results is done using the term weights using adapted cosine function. The search process and the ranking of relevant documents are accomplished within the context of a particular user or community point of view.

Armin Hust [19] performed query expansion by using previous search queries by one or more users and their relevant documents. This query expansion method reconstructs the query as a linear combination of existing old queries. The terms of the relevant documents of these existing old queries are used for query expansion. However, the approach does not take the user into account.

Lin et. al [15] presented an approach to perform personalized web search based on Probabilistic Latent Semantic Analysis (PLSA), a technique which stems from linear algebra. They extracted a co-occurrence triple consisting of a user, query, and corresponding web pages viewed for a query, by mining the web-logs of the users and modeled the latent semantic relationship between them using PLSA.

2.4 Machine Learning Based Approaches

Personalized search is one such potential problem where the use of Machine learning algorithms have received a wide attention recently to learn functions that can perform desired operations when trained on required amount of data. The previous history of the user, can we learn a model representing the user. This makes user modeling a perfect application for machine learning.

Joachims et al[22, 6] has proposed a new machine learning algorithm Ranking SVM, a variation of the Support Vector Machine learning algorithm which can learn from the partial feedback data present in the clickthrough data of users. Teevan et.al [7] proposed a personalized search using user profile based on desktop search index and to learn the user profile it use rankingNets. They consider the user profile as the implicit feedback and incorporate them into the ranking of web search results.

It is found that machine learning approach for personalized search ranking with original web ranking can achieve better results than the original ranking.

3. FRAMEWORK FOR PERSONALIZATION

Our framework (Fig. 1) implements a re-ranking process and enables an effective personalization using user query log and clickthrough data. The framework consists of five components: Request Handler, Query Processor, Result Handler, Event Handler and Response Handler. It has log repositories which stores user query logs and clickthrough data, the format of data structure is explained in section 4. In figure 1, dashed arrow represents the data transaction with database and solid arrow represents the process communication between the components. Square block represents as components and circle as internal process blocks of a components in the framework.

Result Handler is the core component of this framework. It receives the search engine result lists and implements the re-ranking approach (explained in section 3.1) using query log and clickthrough data and generates personalized result which sends to the user. The Request Handler component handles the user query request and maintain request load by queuing the request. Query processor is a key component of the framework which process the user query and prepare keyword phrase which pose to search engine. It also updates the userid, query and keywords in the Query Log database. Response Handler presents the personalized results received from result handler. Event Handler is an event listener which listen the click events on results link. It captures the link URL and rank from the clicked page and update the clickthrough database.

Request Handler

Query Processor

Event Handler

Result Handler

Personalized Result

C:\Program Files\Microsoft Office\MEDIA\CAGCAT10\j0292020.wmf

Query Log and Clickthrough Data

Response Handler

Search Engines

Search Result

Re-Ranking Process

Query

Link Clicked

User

Result

Fig â€“1 Framework for Personalization Using Query Log and Clickthrough Data

When a query submitted by a user it received by the request handler. The submitted query might not be in appropriate structure for submitting to a search engine. Query Handler process this query to filter and prepare the keywords and phrases which are submitted to the search engine, and at the same time it update the Query log database. On retrieval of search result from search engines Result Handler filter the duplicate result and organize the result as per the search engine ranking. The organize result under goes re-ranking process with support of Query log and clickthrough data recorded for this user query. Execution of re-ranking process reorder the organized data as per the user passed interest and a relevant personalized result presented to the user. The presenting of personalized result handles by result handler. It may possible the presented result may not be so relevant to user needs. To make it more precise each result link bounds with a click event. On clicking a particular link on the page event handler listen and records the clicked link data to clickthrough database. The continuous of these activities improve the personalization relevancy as clickthrough data against a query increases for a user.

Our framework use clickthrough data that is recorded in search engine logs to simulate user experiences in web information searching. In general, when a user pose a query, the user usually checks the entire links result list from top to bottom. The user clicks one or more link that looks relevant and skips those links which are not relevant. Effective information retrieval is achieved when a precise personalization approach perform re-ranking of the relevant links and place it in higher in results list. Therefore, we utilize user clicks as relevance decision measure to evaluate the searching accuracy. Since click-through data can collected straightforward with less effort, it is possible to do major evaluation under this framework. Furthermore, click-through data reflects the real world distribution of queries, users and search scenarios. Thus, using click-through data is closer to real cases in evaluating personalized search than user surveys.

3.1 Re-Ranking Approach

The presented framework implements the re-ranking algorithm based on an assumption that user submit a query Q for an information, and the obtained web links which repeatedly clicked by U in the past are more relevant to U, than those links rarely clicked by U. Thus, a new ranking value of the resulted link L which is retrieved on posing the query Q by user can be computed.

Letâ€™s assume that a result set of SR has obtained top 10 results on posing a query Q as (R1, R2, R3, R4, R5 ,R6 ,R7 ,R8 ,R9 ,R10) and same as the current rank from R1 . . . . . R10 as return by search engine.

An assumption, that a high numbers of clickthrough log data is recorded with respect to users in log database. An extraction of relevant clickthrough records as CS will be made based on the user identification as U and query Q.

(1)

The search result, SR obtains from search engine an average rate of as Rrate need to calculate using CS for re-ranking the obtained result. To compute the Rrate an iteration of SR result need to made against CS data records.

(2)

(3)

Where, Rr is the sum of rate count for record Ri. and n is the number of records in CS.

Based on the Rrate of the search results the result will be sorted in descending order to user personalized result as Pres.

(4)

Where, is the search record from i=1 to n as number of search records in SR, and is the average rate of the each record from i=1 to n.

4. EXPERIMENTAL RESULTS

To evaluate the search performance of personalized framework, each participating user is necessary to issue a definite number of test queries, and also need to decide whether each result is relevant or not. An advantage of this approach is that the relevance of documents can be explicitly judged by the participants. However, constraints on the number of participants and test queries may give ambiguity evaluation results on accuracy and reliability of the personalization algorithm.

4.1 Training Data

Clickthrough data are denoted as a triplet (q, r, c), where q is the input query consisting of a set of keywords, r is a list of ranked links, ( l1,â€¦., ln ), and c is the set of links that the user has clicked on. Here the data in the collection are sorted by anonymous user ID and sequentially arranged. The data set includes AnonID, Query, QueryTime, Rank and Clicked URL.

Table.1: A snapshot of the Clickthrough Data

AnonID

Query

QueryTime

Rank

Clicked URL

System1

Data Mining

2012-10-20 13:02:04

www.thearling.com/text/dmwhite/dmwhite.htm

System1

Data Mining

2012-10-20 13:23:08

www-users.cs.umn.edu/~kumar/dmbook/index.php

System1

Data Classification

2012-10-20 18:05:27

www.en.wikipedia.org/wiki/Data_classification

System1

Data Classification

2012-10-20 18:05:29

www.computereconomics.com/article.cfm?id=1117

For creating a clickthrough database we submit five different queries to five different search engines and perform click events on the result pages return by the search engine to records the clickthrough results. We repeat this activity to build a clickthrough database of 1000 records. The records are stored in the database in columns as shown in the Table 1.

4.2 Test Data Evaluation

To evaluate the framework we given a query as "Data Visualization Tool" to a search engine, the return results are handled by result hander and implements re-ranking process for personalization. We collect top 10 results and their rank order as SR => {R1, R2, R3, R4, R5 ,R6 ,R7 ,R8 ,R9 ,R10} from the result handler.

We perform a rule - based classification query on clickthrough database to obtain the relevance results. The rule implements a precondition and a matching attributes to classify the required results based on input as AnonId and Query. We define a rule to obtain the clickthrough data as CR as follows,

CR = ( AnnoId=â€™input annoidâ€™) ^ (Query = â€˜input Queryâ€™)

If the condition in the rule is true for the given attributes then only it returns the stored results. The obtained result of SR and CR are used for re-ranking process.

For re-ranking the each search result of SR average rate Rrate need to be calculated. To calculate the Rrate we need to compute value. To obtain the value of iteration on the CR data need to make for each record in SR result as follows.

For (Each Search_Record in SR ) do

Initialization: (Lcount=0)

For (Each Click_Record in CR ) do

If (Search_Record Link == Clicked_Linked) Then

Lcount = Lcount +1

End If

End For

Lcount

End For

Fig-2 : Algorithm for Average Rate calculation for each record of search results,

On computation the algorithm in Fig-2 we obtain the of each results obtained form the search engine. We implement a sorting in descending order based on to reorganize and re-ranked the results for personalization. The re-ranked result is a personalized result as per users past history clickthrough log data and search query.

6. CONCLUSIONS

Results that are very relevant in a specific query may be ranked lower in the original result list due to lack of data required to relate the relevancy and calculate the re-ranking. In this paper, we proposed an evaluation framework based on query logs and clickthrough data to enable support and evaluation of personalization search. To evaluate the framework we create a clickthrough database on various queries and run a test query for re-ranking and personalization result. The experiments results support that click-based personalization algorithms worked well in satisfying the user needs and minimize the searching time. The algorithms efficiently work for re-ranking and user personalization. Our future work focuses on online evaluation of the framework and development of an automatic prediction algorithm based on the user profile and feedback.

REFRENCES

Micro Speretta and Susan Gauch. Personalizing search based on user search histories. In Thirteenth International Conference on Information and Knowledge Management (CIKM 2004), 2004.

K. Sugiyama, K. Hatano, and M. Yoshikawa, "Adaptive Web Search Based on User Profile Constructed without Any Effort from Users," Proc. 13th International World Wide Web Conf. (WWW â€™04), pp. 675-684, 2004

Chien-Kang Huang, Lee-Feng Chien, and Yen-Jen Oyang. Relevant term suggestion in interactive web search based on contextual information in query session logs. JASIST 54(7): 638-649, 2003.

B. Tan, X. Shen, and C. Zhai, "Mining Long-Term Search History to Improve Search Accuracy," Proc. 12fth ACM SIGKDD International Conf. Knowledge Discovery and Data Mining (KDD â€™06), pp. 718-723, 2006.

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000

Thorsten Joachims. Optimizing search engines using clickthrough data. In KDD â€™02: Proceedings of the eighth ACM SIGKDD international conference on Knowl- edge discovery and data mining, pages 133â€“142, New York, NY, USA, 2002. ACM Press.

J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing search via automated analysis of interests and activites. In Proceedings of SIGIR 2005, 2005.

Jian-Yun Ji-Rong Wen and Hong-Jiang Zhang. Query clustering using user logs. ACM Transactions on Information Systems (TOIS), 20(1):59â€“81, 2002.

Natalie S. Glance. Community search assistant. In In Proceedings of the In- ternational Conference on Intelligent User Interfaces, pages 91â€“96. ACM Press, 2001.

J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Clustering user queries of a search engine. In Proceedings of the Tenth International World Wide Web Conference, Hong Kong, May 2001.

Larry Fitzpatrick and Mei Dent. Automatic feedback using past queries: Social searching? In In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 306â€“313. ACM Press, 1997.

V. Lavrenko, M. Choquette, and W. B Croft. Cross-lingual relevance models. In Proceedings of the 25th Annual International ACM-SIGIR Conference on Re- search and Development in Information Retrieval, 2001.

J. Teevan, S.T. Dumais, and E. Horvitz, "Personalizing Search via Automated Analysis of Interests and Activities," Proc. 28th Ann. International ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR â€™05), pp. 449-456, 2005.

F. Liu, C. Yu, and W. Meng, "Personalized Web Search for Improving Retrieval Effectiveness," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 28-40, Jan. 2004.

Henxi Lin., Gui-Rong Xue., Hua-Jun Zeng., and Yong Yu. Using probabilistic la- tent semantic analysis for personalized web search. In Proceedings of APWEBâ€™05, 2005.

X. Shen., B. Tan., and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of SIGIR 2005, page 4350, 2005.

B. Tan, X. Shen, and C. Zhai. Mining long term search history to improve search accuracy. In Proceedings of 2006 ACM Conference on Knowledge Discovery and Data Mining (SIGKDDâ€™2006), pages 718â€“723s, 2006.

Boris Chidlovskii, Nathalie Glance, and Antonietta Grasso. Collaborative re- ranking of search results. In Proc. AAAI-2000 Workshop on AI for Web Search., 2000.

Armin Hust. Query expansion methods for collaborative information retrieval. Inform., Forsch. Entwickl., 19(4):224â€“238, 2005.

P.A. Chirita, C. Firan, and W. Nejdl, "Summarizing Local Context to Personalize Global Web Search," Proc. ACM International Conf. Information and Knowledge Management (CIKM), 2006.

J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen, "CubeSVD: A Novel Approach to Personalized Web Search," Proc. 14th International World Wide Web Conf. (WWW â€™05), pp. 382-390, 2005.

Filip Radlinski and Thorsten Joachims. Evaluating the robustness of learning from implicit feedback. In ICML Workshop on Learning In Web Search, 2005.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Personalization Using Query Log And Clickthrough Data Computer Science Essay

Abstract

1. INTRODUCTION

2. RELATED WORK

2.1 Query Logs

2.2 Language Modeling Approaches

2.3 Collaborative Filtering Based Approaches

2.4 Machine Learning Based Approaches

3. FRAMEWORK FOR PERSONALIZATION

Result Handler

Search Engines

User

3.1 Re-Ranking Approach

4. EXPERIMENTAL RESULTS

4.1 Training Data

4.2 Test Data Evaluation

6. CONCLUSIONS

REFRENCES

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time