Categories Of Data Used In Web Personalization

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Department of Production Engineering, Government Engineering College, Bhavnagar ,Gujarat, India

ABSTRACT:

In this paper author gives a information about Web content, Web usage and Web structure. when searching and browsing WWW to obtain the information in the form of personalization. Author also gives goals ,modes of personalization and thrust of data personalization. Because of the explosive proliferation of the Web, Web personalization has recently gained a big share of attention, and significant strides have already been accomplished to achieve WWW personalization while facing tough challenges.

I,INTRODUCTION

The Web information age has brought a dramatic increase in the sheer amount of information (Web content), the access to this information (Web usage), as well as the intricate complexities governing the relationships within this information (Web structure). Hence, not surprisingly, information overload, when searching and browsing the WWW, has become the "plague du jour". One of the most promising and potent remedies against this plague comes in the form of personalization. Personalization aims to customize the interactions on a website depending on the user’s explicit and/or implicit interests and desires.

II,BACKGROUND

The Birth of Personalization: No Longer an Option, But a Necessity

The move from traditional physical stores of products or information (such as grocery stores or libraries) to virtual stores of products or information (such as e-commerce sites and digital libraries) has practically eliminated physical constraints traditionally limiting the number and variety of products in a typical inventory. Unfortunately, the move from the physical to the virtual space has drastically limited the traditional three dimensional layout of products for which access is further facilitated thanks to the sales representative or librarian who know their products and their customers, to a dismal planar interface without the sales representative or librarian. As a result, the customers are drowned by the huge number of options, most of which they may never even get to know. In the late 90’s, Jeff Bezos, CEO of Amazonä once said, "If I have 3 million customers on the Web, I should have 3 million stores on the Web" (Schafer et al., 1999). Hence, in both the e-commerce sector and digital libraries, Web personalization has become more of a necessity than an option. Personalization can be used to achieve several goals, ranging from increasing customer loyalty on e-commerce sites (Schafer et al., 1999) to enabling better search (Joachims T., 2002).

Possible Goals of Web Personalization

      Converting browsers into buyers

     Improving website design and usability

     Improving customer retention and loyalty

     Increasing cross-sell by recommending items related to the ones being considered

     Helping visitors to quickly find relevant information on a website

     Making results of information retrieval/search more aware of the context and user interests

III, Modes of Personalization

Personalization falls into four basic categories, ordered from the simplest to the most advanced:

(1) Memorization – In this simplest and most widespread form of personalization, user information such as name and browsing history is stored (e.g. using cookies), to be later used to recognize and greet the returning user. It is usually implemented on the Web server. This mode depends more on Web technology than on any kind of adaptive or intelligent learning. It can also jeopardize user privacy.

(2) Customization – This form of personalization takes as input a user’s preferences from registration forms in order to customize the content and structure of a web page. This process tends to be static and manual or at best semi-automatic. It is usually implemented on the Web server. Typical examples include personalized web portals such as My Yahoo!ä.

(3) Guidance or Recommender Systems – A guidance based system tries to automatically recommend hyperlinks that are deemed to be relevant to the user’s interests, in order to facilitate access to the needed information on a large website (Schafer et al., 1999; Mobasher et al., 2000; Nasraoui et al., 2002). It is usually implemented on the Web server, and relies on data that reflects the user’s interest implicitly (browsing history as recorded in Web server logs) or explicitly (user profile as entered through a registration form or questionnaire). This approach will form the focus of our overview of Web personalization.

(4) Task Performance Support – In these client-side personalization systems, a personal assistant executes actions on behalf of the user, in order to facilitate access to relevant information. This approach requires heavy involvement on the part of the user, including access, installation, and maintenance of the personal assistant software. It also has very limited scope in the sense that it cannot use information about other users with similar interests.

In the following, we concentrate on the third mode of personalization, namely, automatic Web personalization based on recommender systems, because they necessitate a minimum or no explicit input from the user. Also, since they are implemented on the server side, they benefit from a global view of all users’ activities and interests in order to provide an intelligent (learns user profiles automatically), and yet transparent (requiring very little or no explicit input from the user) Web personalization experience.

IV,MAIN THRUST

1,Phases of Automatic Web Personalization

The Web personalization process can be divided into four distinct phases (Schafer et al., 1999; Mobasher et al., 2000):

(1) Collection of Web data – Implicit data includes past activities/click streams as recorded in Web server logs and/or via cookies or session tracking modules. Explicit data usually comes from registration forms and rating questionnaires. Additional data such as demographic and application data (for example, e-commerce transactions) can also be used. In some cases, Web content, structure, and application data can be added as additional sources of data, to shed more light on the next stages.

(2) Preprocessing of Web data – Data is frequently pre-processed to put it into a format that is compatible with the analysis technique to be used in the next step. Preprocessing may include cleaning data of inconsistencies, filtering out irrelevant information according to the goal of analysis (example: automatically generated requests to embedded graphics will be recorded in web server logs, even though they add little information about user interests), and completing the missing links (due to caching) in incomplete clickthrough paths. Most importantly, unique sessions need to be identified from the different requests, based on a heuristic, such as requests originating from an identical IP address within a given time period.

(3) Analysis of Web data – Also known as Web Usage Mining (Spiliopoulou and Faulstich, 1999; Nasraoui et al., 1999; Srivastava et al., 2000), this step applies machine learning or Data Mining techniques to discover interesting usage patterns and statistical correlations between web pages and user groups. This step frequently results in automatic user profiling, and is typically applied offline, so that it does not add a burden on the web server.

(4) Decision making/Final Recommendation Phase – The last phase in personalization makes use of the results of the previous analysis step to deliver recommendations to the user. The recommendation process typically involves generating dynamic Web content on the fly, such as adding hyperlinks to the last web page requested by the user. This can be accomplished using a variety of Web technology options such as CGI programming. 

V, Categories of Data used in Web Personalization

The Web personalization process relies on one or more of the following data sources (Eirinaki and Vazirgiannis, 2003):

(1) Content Data – Text, images, etc, in HTML pages, as well as information in databases.

(2) Structure Data –Hyperlinks connecting the pages to one another.

VI,CONCLUSION

Because of the explosive proliferation of the Web, Web personalization has recently gained a big share of attention, and significant strides have already been accomplished to achieve WWW personalization while facing tough challenges. However, even in this slowly maturing area, some newly identified challenges beg for increased efforts in developing scalable and accurate web mining and personalization models that can stand up to huge, possibly noisy, and highly dynamic web activity data. Along with some crucial challenges, we have also pointed to some possible future direction in the area of WWW personalization.

(3) Usage Data – Records of the visits to each web page on a website, including time of visit, IP address, etc. This data is typically recorded in Web server logs, but it can also be collected using cookies or other session tracking tools.

(4) User Profile – Information about the user including demographic attributes (age, income, etc), and preferences that are gathered either explicitly (through registration forms) or implicitly (through Web server logs). Profiles can be either static or dynamic. They can also be individualized (one per user) or aggregate (summarize several similar users in a given group).

VII, Challenges in WWW Personalization

WWW personalization faces several tough challenges that distinguish it from the main stream of data mining:

(1) Scalability – In order to deal with large websites that have huge activity, personalization systems need to be scalable, i.e. efficient in their time and memory requirements. To this end, some researchers (Nasraoui et al., 2003) have started considering web usage data as a special case of noisy data streams (data that arrives continuously in an environment constrained by stringent memory and computational resources. Hence the data can only be processed and analyzed sequentially, and cannot be stored).

(2) Accuracy – WWW personalization poses an enormous risk of upsetting users or e-commerce customers in case the recommendations are inaccurate. One promising approach (Nasraoui and Pavuluri, 2004) in this direction is to add an additional data mining phase that is separate from the one used to discover user profiles by clustering previous user sessions, and whose main purpose is to learn an accurate recommendation model. This approach differs from existing methods that do not include adaptive learning in a separate second phase, and instead base the recommendations on simplistic assumptions (e.g. nearest profile recommendations, or deployment of pre-discovered association rules). Based on this new approach a new method was developed for generating simultaneously accurate and complete recommendations, called Context Ultra-Sensitive Approach based on two-step Recommender systems (CUSA-2-step-Rec) (Nasraoui and Pavuluri, 2004). CUSA-2-step-Rec relies on a committee of profile-specific URL-predictor neural networks. This approach provides recommendations that are accurate and fast to train because only the URLs relevant to a specific profile are used to define the architecture of each network. Similar to the task of completing the missing pieces of a puzzle, each neural network is trained to predict the missing URLs of several complete ground-truth sessions from a given profile, given as input several incomplete sub sessions. This is the first approach that, in a sense, personalizes the recommendation modeling process itself depending on the user profile.

(3) Evolving User Interests – Dealing with rapidly evolving user interests and highly dynamic websites requires a migration of the complete web usage mining phases from an offline framework to one that is completely online. This can only be accomplished with scalable single-pass evolving stream mining techniques (Nasraoui et al., 2003). Other researchers have also studied web usage from the perspective of evolving graphs (Desikan and Srivastava, 2004).

(4) Data Collection and Preprocessing – Preprocessing Web usage data is still imperfect, mainly due to the difficulty to identify users accurately in the absence of registration forms and cookies, and due to log requests that are missing because of caching. Some researchers (Berendt et al., 2001) have proposed clickstream path completion techniques that can correct problems of accesses that do not get recorded due to client caching.

(5) Integrating Multiple Sources of Data – Taking semantics into account can also enrich the Web personalization process in all its phases. A focus on techniques and architectures for more effective integration and mining of content, usage, and structure data from different sources is likely to lead to the next generation of more useful and more intelligent applications (Li J. and Zaiane O., 2004). In particular, there has recently been an increasing interest in integrating web mining with ideas from the semantic web, leading to what is known as semantic web mining (Berendt et al., 2002).

(6) Conceptual Modeling for Web usage Mining – Conceptual modeling of the web mining and personalization process is also receiving more attention, as web mining becomes more mature, and also more complicated. Recent efforts in this direction include (Meo et al., 2004; Maier, 2004).

(7) Privacy Concerns – Finally privacy adds a whole new dimension to WWW personalization. In realty, many users dislike giving away personal information. Some may also be suspicious of websites that rely on cookies, and may even block cookies. In fact, even if a web user agrees to giving up personal information or accepting cookies, there is no guarantee that websites will not exchange this information without the user’s consent. Recently, the W3C (World Wide Web Consortium) has proposed recommendations for a standard, called Platform for Privacy Preferences (P3P), that enables Websites to express their privacy practices in a format that can be retrieved and interpreted by client browsers. However, legal efforts are still needed to ensure that websites truly comply with their published privacy practices. For this reason, several research efforts (Agrawal and Srikant, 2000; Kargupta et al., 2003) have attempted to protect privacy by masking the user data using several methods such as randomization, that will modify the input data, yet without significantly altering the results of data mining. The use of these techniques within the context of Web mining is still open for future research.

X,REFERENCES

  Agrawal R. and Srikant R. (2000). Privacy-preserving data mining, In Proc. of the ACM SIGMOD Conference on Management of Data, Dallas, Texas, 439-450.

Berendt B., Bamshad M, Spiliopoulou M., and Wiltshire J. (2001). Measuring the accuracy of sessionizers for web usage analysis, In Workshop on Web Mining, at the First SIAM International Conference on Data Mining, 7-14.

Berendt B., Hotho A., and Stumme G. (2002). Towards semantic web mining. In Proc. International Semantic Web Conference (ISWC02).

Desikan P. and Srivastava J. (2004), Mining Temporally Evolving Graphs. In Proceedings of "WebKDD- 2004 workshop on Web Mining and Web Usage Analysis", B. Mobasher, B. Liu, B. Masand, O. Nasraoui, Eds. part of the ACM KDD: Knowledge Discovery and Data Mining Conference, Seattle, WA.

Eirinaki M., Vazirgiannis M. (2003). Web mining for web personalization. ACM Transactions On Internet Technology (TOIT), 3(1), 1-27.

Joachims T. (2002). Optimizing search engines using clickthrough data. In Proc. of the 8th ACM SIGKDD Conference, 133-142.

Kargupta H., Datta S., Wang Q., and Sivakumar K. (2003). On the Privacy Preserving Properties of Random Data Perturbation Techniques, In Proc. of the 3rd ICDM IEEE International Conference on Data Mining (ICDM'03), Melbourne, FL.

Li J. and Zaiane O. (2004), Using Distinctive Information Channels for a Mission-based Web-Recommender System. In Proc. of "WebKDD- 2004 workshop on Web Mining and Web Usage Analysis", part of the ACM KDD: Knowledge Discovery and Data Mining Conference, Seattle, WA.

Linden G., Smith B., and York J. (2003). Amazon.com Recommendations Item-to-item collaborative filtering, IEEE Internet Computing, 7(1), 76-80.

Maier T. (2004). A Formal Model of the ETL Process for OLAP-Based Web Usage Analysis. In Proc. of "WebKDD- 2004 workshop on Web Mining and Web Usage Analysis", part of the ACM KDD: Knowledge Discovery and Data Mining Conference, Seattle, WA.

Meo R., Lanzi P., Matera M., Esposito R. (2004). Integrating Web Conceptual Modeling and Web Usage Mining. In Proc. of "WebKDD- 2004 workshop on Web Mining and Web Usage Analysis", part of the ACM KDD: Knowledge Discovery and Data Mining Conference, Seattle, WA.

Mobasher, B., Cooley, R., and Srivastava, J. (2000). Automatic personalization based on web usage mining, Commuunications of the. ACM, 43(8) 142–151.

Mobasher B., Dai H., Luo T., and Nakagawa M. (2001). Effective personalization based on association rule discovery from Web usage data, ACM Workshop on Web information and data management, Atlanta, GA.

Nasraoui O., Krishnapuram R., and Joshi A. (1999). Mining Web Access Logs Using a Relational Clustering Algorithm Based on a Robust Estimator, 8th International World Wide Web Conference, Toronto, 40-41.

Nasraoui O., Krishnapuram R., Joshi A., and Kamdar T. (2002 ). Automatic Web User Profiling and Personalization using Robust Fuzzy Relational Clustering, in "E-Commerce and Intelligent Methods" in the series "Studies in Fuzziness and Soft Computing", J. Segovia, P. Szczepaniak, and M. Niedzwiedzinski, Ed, Springer-Verlag.

Nasraoui O., Cardona C., Rojas C., and Gonzalez F. (2003). Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm, in Proc. of WebKDD 2003 – KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington DC, 71-81.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now