Detection Of Sentiment In Web Content

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Recent technological advances have produced major changes in the way information is retrieved. Internet by Web 2.0 technologies, has allowed the transition from static presentation of information, to a dynamic way, directly involving users. The Web is currently a platform that allows users to interact with each other, to facilitate the exchange of information. Users have become, from mere consumers of information in the online environment, active participants which increases the information content.

Techniques of opinion mining and sentiment analysis offers a possibility of automatic analysis of user-generated content. Current research in this area allows automatic identification and extraction of opinions and emotions. Information generated by Internet users has increased exponentially in recent years, become important for the extraction of knowledge from a virtual environment. The sheer volume of data makes it impossible for manual processing, and also, automatic analysis requires additional difficulties, due to the use of informal language by users. Sometimes, depending on the specific of text, the propositional structure is absent [1].

Sentiment analysis, called also in scientific literature as opinion mining, involves the determination and classification of opinions or feelings expressed in text, through the use of computing machines. An opinion presumes the existence of an opinion holder, a target entity on which opinion is issued, a particular aspect of the entity and a sentiment orientation of that opinion [2].

Overview

Identifying consumer opinions on a company's products is as important as knowing sales volumes, but often it is more difficult to obtain. Companies can no longer rely only on internal data in their business analysis. Current research in sentiment analysis addresses these needs of expanding the amount of information collected by companies, by analyzing the huge volume of information generated by the online social networks (Facebook, Twitter, Google+), the comments from e-commerce sites (Amazon.com) and reviews on products or services on specific platforms [3].

An important role in the growth of user-generated information, have had social networks. These have changed the way information reaches to potential customers, changing the communication traditional mode of one to many to one-to-one communication [4]. Opinion mining techniques used in social networks helps to understand how certain products or services, are perceived in the market. Marketers have significantly changed the way of communication with potential customers, understanding the potential of using marketing in social networks [5]. Studying the social environment also provides consumer information about needed products, through feedback provided in comments and reviews.

The large number of online reviews present on various specialized platforms meet the information needs of consumers. They can be used for comparing offers from the market in order to make an informed purchasing decision. For the average user, taking an informed choice, based on the information in the online environment, is difficult due to the huge amount of content and the inability to browse enough of it to create an accurate image. Also knowledge of a user, on comparison metrics for a product can be reduced on a given area.

Objective

This case study aims to examine ways of applying sentiment analysis in order to identify an effective mean of detection and classification of opinions in online users reviews, focusing also on the determination of specific aspects of entities on which opinions are emitted. To achieve the proposed objective it will be studied applying sentiment analysis on several levels.

An efficient process based on sentiment analysis, should classify the reviews made ​​on a particular area, to sum up existing opinions providing users results easy to interpret, and should facilitate the selection of aspects of entities the opinions are made on, classifying them in positive or negative values.

Proposed method

Sentiment analysis can be performed using knowledge-based system or a machine learning system. Knowledge-based systems requires the creation of rules and manual adjustment, unlike automated learning systems, that require the existence of a significant set of data for training and process automation. The case study focuses on machine learning systems.

Sentiment analysis can be applied on several levels:

On sentence level

On document level

On feature level.

Opinions resulted from analysis can be summarized as scalar ​​or polarity values. In this case study approach, will be presented two ways of detecting polarity of opinions extracted from online platforms. One way of analysis is, to use words semantic polarity, and an algorithm to determine the semantic orientation of sentences and documents. Another way is to use supervised automatic algorithms. In this case the algorithm is first trained on a data set representative for the analysis domain.

Using semantic orientation has the advantage that it is independent of the domain the analysis is made on, but achieving better accuracy in detection of opinions require training a dataset domain oriented.

Methodology

Opinion mining process requires completion of certain phases similar to a process of knowledge extraction:

- In the first phase it is determined the analyzed domain, are identified the appropriate data sources (web sites from which the analyzed data will be collected under) and is determined how data is collected

- The second stage involves the actual data acquisition and processing for identifying opinions in the next step. At this stage will be constructed the dataset (user reviews from web pages) using a web crawler. These comments are subject to pre-processing, eliminating irrelevant text in analysis. There are stored only those sentences in which opinions are expressed, the rest are removed. Also not all the words in sentences are relevant to determining an opinion. There are some proposition parts in the sentence irrelevant that must be removed from the dataset.

Determining domain and identifying data

Acquire and preprocessing of data

Classification

Determining opinion polarity

Results summarization

Interpretation and presentation of results

Fig 1. Stages of an opinion mining analysis

- The central stage is the process of classification. In the analyzed process the reviews obtained in previous stages are classified, determining their polarity. The classification algorithms used identifies a review as part of one of the classes: positive, negative or neutral.

- At this stage determining the general classification of reviews is done by aggregating opinions at sentences level after a chosen algorithm. Also establishing opinion on document level is calculated as a summary of results at the sentence level.

- The final stage is, the process of interpretation of the results obtained, and the presentation of classification results for the required level of analysis [6].

Results can be delivered in the form of a score identifying the polarity of opinion, usually positive or negative depending on the two classified classes, or as textual or visual categories using graphs.

Following the classification the obtained results validity, must be checked. The study presents results of the assessment classification using text mining domain-specific measures. These measures are: precision, recall and accuracy. In this study, for having an accurate assessment of the classification of the two proposed methods, it will be evaluate the effectiveness, using the same set of data, on which sentiment analysis is applied on sentence-level. For classifying opinions in classes, defining propositions as instances of a class will have the following measures:

Since the precision and recall are not always relevant individual, we calculate for greater relevance harmonic mean of the two, called indicator measure F:

Results

The study aims to determine the opinions in user reviews on movies, reviews taken from the specialized websites. It is proposed to analyze these comments using the two methods of opinion mining:

- Supervised sentiment analysis.

- Unsupervised sentiment analysis.

For the case study we used a collection of reviews extracted and classified into positive and negative classes. Collection taken from [7] is divided into a class of positive sentences and one containing negative sentences. Reviews submitted on websites consist usually of several sentences. To facilitate the process the analysis was performed on the sentence level. Using an aggregation algorithm, it can be determined the general review opinion based on the opinions extracted from each sentences. In Figure 2 we present reviews on one movie, posted by users, as they appear on a prestigious movie reviews site (www.rottentomatoes.com).

Fig.2 Example of user reviews on a movie

Supervised sentiment analysis

The first process of sentiment analysis is conducted with a supervised algorithm based on a naive Bayesian classifier. The way of performing this type of analysis is described in [6]. The advantage of using Bayesian classifier is that it provide good results while being quick and easy to implement.

Classifier based on Bayes' theorem, it use contrary event probabilities and independent probabilities of events to determine a conditional probability.

The algorithm analyzes a set of positive and negative ranked examples and based on frequency of occurrence in each class, estimates probability that a word has positive or negative significance. Based on the probability of each word occurrence, document or sentence probability is computed by calculating the product of these probabilities. The process requires a pre classified in two classes, positive and negative data set, specific to supervised learning processes, with which is calculated the occurrences of words in classes.

This method trait each word of a sentence as independent, with no connection between them. But in reality some words appear more frequently, or are dependent on each other in certain contexts expressing an opinion. Also certain words are rarely present in expressing opinions.

Classification efficiency is calculated using the measures discussed above. For classification it will be used a training set consisting of 5000 sentences extracted from reviews. Based on this training set, 300 sentences will be analyzed and the following results are obtained:

Tab. 1 Efficiency of initial supervised algorithm

Precision

Recall

Accuracy

F Measure

0.814332247557

0.75987841945289

0.79331306990881

0.78616352201258

Thus by applying naive Bayesian classification algorithm we obtained an accuracy of 0.814332247557. It will be studied ways to improve the accuracy. The first step is to remove from reviews those words that cannot express opinions. These words are articles, prepositions, pronouns and conjunctions. We obtain the following values ​​of the measured indicators:

Tab. 2 Efficiency of supervised algorithm after removing stop words

Precision

Recall

Accuracy

F Measure

0.81699346405229

0.75987841945289

0.79483282674772

0.78740157480315

It is obtained a slightly higher precision value. We continue by trying to include correlations between words. Original algorithm considers words as uncorrelated with each other, but in expressing opinions there are certain words that occur frequently together. The introduction of these correlations involves assessing the probability of occurrence of groups of several words instead of one word. The tables below present the results for groups of n = 2 words:

Tab. 3 Efficiency of supervised algorithm for groups of n=2 words

Precision

Recall

Accuracy

F Measure

0.82084690553746

0.76595744680851

0.79939209726444

0.79245283018868

For n = 3 word groups results are presented in Table 4:

Tab. 4 Efficiency of supervised algorithm for groups of n=3 words

Precision

Recall

Accuracy

F Measure

0.79723502304147

0.5258358662614

0.69604863221884

0.63369963369963

It is noted that maximum efficiency is obtained for groups up to n = 2 words. For more complex groupings accuracy decreases.

Unsupervised sentiment analysis

For unsupervised analysis of opinions we propose an algorithm that uses a lexical resource. To compare the effectiveness of this algorithm, to the supervised algorithm described above, a sentence-level analysis will be performed. Algorithm requires several steps. The process involves splitting reviews into sentences and identify each word from them. Then comes the stage of tagging parts of speech and memorize the order of words in sentences. It will be identified the polarity of each lexical word using a dictionary, and sentences will be classified.

The process is described in detail in [8]. In the classification process we use the SentiWordNet lexical resource. This lexical resource described in [9], developed by Andrea Esuli and Fabrizio Sebastiani, associate to a set of synonyms (called sysnets) from the WordNet lexical database (http://wordnet.princeton.edu) a score of positivity, negativity and objectivity. In the presented study, this dictionary will be processed so that each word associated with a certain emotion states will have a positive or negative score depending on polarity.

From anterior research it was found that some parts of speech are more associated with emotion than others. So usually nouns and verbs are mostly objective, and the most subjective value have adjectives and adverbs. In the SentiWordNet lexical dictionary more than 67% of adverbs have a high score of subjectivity. Under these conditions an important stage in sentiment analysis is to identify parts of speech in a sentence so that they can be classified as fair as possible.

There are several types of algorithms used for tagging parts of speech (POS tagging) based on a training set and a languages specific model. Tags used to identify parts of speech are presented in Table 5.

Tab. 5 Tags used in identifying the parts of speech

Part of speech (POS)

Marker (tag)

Noun

NN, NNP (proper noun), NNPS (proper noun, plural), NNS

(Plural)

Verb

VB, VBD (past time), VBP (present time), VBZ (present third person), VBG (Participle), VBN (Past Participle).

Adverb

RB, RBR (Comparative), RBS (Superlative)

Adjective

JJ, JJR (Comparative), JJS (Superlative)

A common problem in sentiment analysis is how to treat negation. Often the presence of a negation in text can completely change the meaning of that text. Treating of negation is done by setting some rules for polarity changing. In the proposed unsupervised sentiment analysis on identifying a negation, the polarity of words bearing subjectivity will be changed, at a distance of up to four positions before and after that negation. At the same time is taken into account the position of the negation in the phrase, if there are several separate sentences [8]. To identify the above problem a negation list is used (a list of specific language words expressing negation).

We assess the effectiveness of the proposed unsupervised classification, by using the same set of sentences, also used in naive Bayes algorithm. This yields the following results:

Tab. 6 Efficiency of unsupervised algorithm

Precision

Recall

Accuracy

F Measure

0.57640750670241

0.65151515151515

0.58636363636364

0,61166

Analyzing the two proposed methods for the detection of opinions, a naive Bayesian supervised algorithm and unsupervised algorithm based on a lexical dictionary, we notice increased accuracy for supervised algorithm. This result can be explained by the fact that this algorithm is strictly focused on the field, requiring a set of pre-classified training. Unsupervised algorithm still get a satisfactory precision, taken into consideration the fact it can be used in a variety of areas and did not require a substantial effort to create a training set.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now