Look At Intelligent Assistance System Computer Science Essay

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Mr.Shah Sahil K. (ME-II Computer Engg.) Vidya Pratishthas College of Engineering,Baramati.

Prof.Takale Sheetal A. Assistant Professor, Information Technology Department

Vidya Pratishthans College of Engineering,Baramati.

Abstract

A good quality of customer support is need of every mankind. Most of the companies’ provide customer support

in the form of helpdesk/assistance systems. Helpdesk systems

available today work on the principle of case based reasoning.

Such systems face major challenge of maintaining an up to

date case history of each and every customer problem. In

proposed system, we present idea of utilising results returned

by the search engine as the case history. However, current

search engines return keyword-based matching results irrespective

of considering semantic relevance of user query with

search engine results. Also, for a given keyword based query,

user has to search down the list by checking each individual

link till desired result is obtained. This degrades the quality

of service provided by search engines.

To address aforementioned challenges, an Intelligent Assistance

System: iASSIST is proposed. It utilises search engine

results as case history of the user query, which resolves

the problem of maintaining an up-to-date case history for

each and every customer query. The proposed system ranks

the search results based on their semantic relevance to the

request. The semantic relevance of the search results with

the user query is computed using NEC SENNA and Word-

Net. These relevant results are grouped into different document

clusters based on Minimum Description Length (MDL)

principle and Symmetric Non negative Matrix Factorization

(SNMF) algorithm. Each cluster is summarized using request

focussed multi document summarization technique to

generate concise solution. For performance analysis the proposed

system is evaluated by user survey. Experiments conducted

demonstrate the effectiveness of iASSIST in semantic

text understanding, document clustering and summarization.

The better performance of iASSIST benefits from the sentence

level semantic analysis, clustering using MDL principle

and SNMF.

Key terms

Intelligent Helpdesk, Semantic Similarity,Web Search Results,

Document Summarization

1. Introduction

Intelligent Help Desk System is the need of every individual.

Many organizations use Case Based Help Desk

System to improve the quality of customer service. For

a given customer request, an intelligent helpdesk system

tries to find the earlier similar requests and the case

history associated with the request. Helpdesk systems

usually use databases to store past interactions between

customers and companies. Interactions may be descriptions

of a problem and recommended solutions. Major

challenge face by these help desk systems is maintenance

of up-to-date case history. Maintaining an up-to-date

Case History for each and every problem is difficult and

costly.

Search Engine is doing the task of intelligent

help for all the users of internet. However, content on

the Web and Enterprise Intranet is increasing day by

day. The web is a vast collection of completely uncontrolled

heterogeneous documents. It is huge, diverse,

and dynamic. For a user keyword query, current Web

Search Engines return a list of pages with respect to the

query. However, the information for a topic, especially

for multi-topic queries in which individual query keywords

occur relatively frequently in the document collection

but rarely occur together in the same document, is

often distributed among multiple physical pages. So the

search engines are drowning in information, but starving

for knowledge.

1.1. Existing System

Currently there are number of helpdesk systems

which try to find the earlier similar requests and the case

history associated with the customer request. These system

returns the solutions by keyword-based search technique

and which are domain specific. However, these

systems face two challenges: 1) Case retrieval measures:

most case-based systems use traditional keywordmatching-

based ranking schemes[12,13] for case retrieval

and have difficulty to capture the semantic meanings of

cases and 2) Result representation: most case-based systems

return a list of past cases ranked by their relevance

to a new request, and customers have to go through the

list and examine the cases one by one to identify their

desired cases. Also, maintenance of up-to- date case history

is a major problem faced by these system.

1

1.2. Motivation

• Help desk Systems: Many industries use help desk

systems/customer care for solving various customer

queries. Companies provide the solution to the

customer problem in three ways viz. online help

desk system,customer representative or customer

care representative(telephonic enquiry).

Many of the problems/queries solved by these

systems are based on reference to solutions for similar

type of problems which were faced previously

by the customers or just by asking different questions

to the user related with problem and narrowing(

filtering) down the result/solution(Case based

systems).

• Search Engine:Search Engine is doing the task of intelligent

help for all the users of internet. For a given

user keyword query, current web search engines return

a list of individual web pages. However, information

for the query is often spread across multiple

pages.The search engine results can be used as data

set providing solutions from different domains.

1.3. Proposed System

The Proposed system address the challenges

faced by present help desk system and web search engines,

by developing an online helpdesk system: iASSIST.

It automatically finds problem-solution pattern

from web using search engines like Google, Yahoo, etc.

For a given user query, iASSIST interacts with the search

engine to retrieve the relevant solutions. These retrieved

solutions are ranked based on their semantic similarity

with user query. Semantic similarity is based on semantic

roles and semantic meanings. The semantically related

documents are further grouped into clusters based

on minimum description length (MDL) principle. Further

in order to support multi topic query, multi document

summarization is performed by using symmetric

nonnegative matrix factorization (SNMF) algorithm and

request focused summarization technique.

1.4. Features of Proposed System

• It automatically finds "problem-solution" pattern

from web search engine. No need of maintaining

an up-to-date case history enables the system to address

queries from any domain.

• Use of Semantic Role labeling and semantic dictionary

for extraction of semantics of sentences and

query is done.

• For grouping of semantically and contextually similar

documents, clustering algorithm based on MDL

Principle and matrix factorization is used.

• Generates concise description (summary) of solution

to the problem.

2. Related Work

2.1. Case Based Systems

There has been major contribution of work

in case-based recommender systems and decision guides,

where the user provides a brief initial description of problems.

The systems use the initial information to retrieve

the candidate set of cases that are similar to the

given problems. Case-based system is a system that uses

knowledge of past cases similar to new customer request

while finding a solution to that system. Working of general

case based system is as shown in figure 1

Example: An auto mechanic who fixes an engine by recalling

another car that exhibited similar symptoms is

using case-based reasoning.

Such existing systems can be described as follows:

2

Figure 1: Case-based problem-solving System[14]

2.1.1. Parameterized Search Engines [5]

This search engine is based on attributes rather

than Boolean combinations of keywords. This search

considers preference of users in all dimensions or various

domains. This search engine helps to increase decision

quality, decision confidence, perceived ease of use and

perceived usefulness.

Drawbacks:

This model was developed for online shopping

system but it considers very small domains tracing

different decisions. This model was quantitative and

structural, considering input, process and output

variables, but does not trace the processes themselves.

Another approach would be to model the user as a

Bayesian information processor. This approach would

require the updating of probabilistic beliefs as users

acquire information.

2.1.2. Conversational Recommender Systems

with Feature Selection [10]

In these systems given an initial user query, the

recommender systems ask the user to provide additional

features describing the searched products in order to

generate questions/features that a user would likely

reply, and if replied, would effectively reduce the result

size of the initial query. Classical entropy-based feature

selection methods can be effective in term of result

size reduction, but they select questions uncorrelated

with user needs and therefore unlikely to be replied.

Feature-selection methods that combine feature entropy

with an appropriate measure of feature relevance can

better capture related questions with the user and can

avoid unwanted questions.

Drawbacks:

These systems require some background knowledge

of user behaviour, such as feature popularity, and

feature probabilistic dependency i.e. prior to problem

solving user preferences must be known so that proper

decisions can be made.

2.1.3. Incremental Case Based Reasoning [4, 8]

Incremental Case-Based Reasoning (I-CBR)

is an incremental case-retrieval technique based on

information-theoretic analysis. The technique is incremental

in the sense that it does not require the entire

target case description to be available at the start, but

in fact builds it up by asking focused questions of the

user. The ordering of these questions reflects their power

to discriminate effectively between the set of candidate

cases at each step.

Drawbacks:

When the description of cases or items becomes

complicated, these case-based systems suffer from the

curse of dimensionality, and the similarity/distance

between cases or items becomes difficult to measure.

Furthermore, the similarity measurements used in these

systems usually are based on keyword matching, which

lacks the semantic analysis of customer requests and

3

existing cases.

2.2. Database Search and Ranking [11]

In database search, many methods have been

proposed to perform similarity search and rank results

of a query like context sensitive ranking which considers

preference of user from one item over another item,

automatic ranking of user query results which makes

use of TF-IDF[12,13] calculation to compute relevance

of user query with previous cases, nearest neighbour

search etc.

Drawbacks:

Similar to the case-based systems, the similarity

is measured based on keyword matching, which have

difficulty to understand the text deeply i.e. it does not

consider contextual relevance between user requests and

stored past cases.

2.3. Clustering Search Results

Existing search engines often return a long list

of search results, clustering technologies are used in

search result organization so that users’ efforts to search

down the list are minimized. Such clustering has been

implemented with a dynamic interface to web search engine

called as Grouper interface.

2.3.1. Grouper [4]

It’s an interface to the results of the Husky

Search meta-search engine, which dynamically groups

the search results into clusters labelled by phrases extracted

from the snippets using different document clustering

algorithms. However, the existing documentclustering

algorithms like suffix tree clustering (STC),

web document clustering (snippet clustering) etc. do not

consider the impact of the general and common information

contained in the documents. In our proposed work,

Table 1: Mathematical Model of Proposed System-I

by filtering out this common information, the clustering

quality can be improved, and better context organizations

can then be obtained.

3. Programmer’s design

3.1. Mathematical Model

Table 1 and Table 2 gives mathematical model

of proposed system.

3.2. Data Flow architecture

Figure 2 shows system architecture of iASSIST.

System works in five modules: Preprocessing Module,

Case Ranking Module, Document Clustering Module,

4

Table 2: Mathematical Model of Proposed System-II

Figure 2: System Architecture

Sentence Clustering Module and Sentence Cluster Summarization

Module. As shown in figure, input to the

system is user query in the form of question. The system

retrieves relevant solutions or past cases from search engine.

Pre-processing of user query and past cases involves

removal of non-words, and then each of the retrieved documents

is truncated into sentences and passed through

semantic role parser for semantic role labelling. Case

ranking module ranks the retrieved documents based on

their sentence level semantic similarity with user query.

Semantically ranked documents need to be grouped according

the context. Top ranking documents are clustered

using Minimum Description Length (MDL) principle

[1]. Sentence Clustering Module groups sentences

having similar meaning into a cluster using Symmetric

Non-negative Matrix Factorization (SNMF) [3]. Sentence

Cluster Summarization module selects most relevant

sentences [2] from each cluster in order to form a

concise summary which is represented as reference solution

to the user.

3.2.1. Preprocessing Module

According to Luhn’s idea [13], in order to remove

the redundancy in documents as well as to reduce the

document size it is essential to consider only meaningful

words. So, preprocessing of problem-solution pattern

involves removal of non-words, stop words and suffix removal

from both the user query and documents retrieved

from search engine. In the proposed work semantic role

information has a great contribution in the decision of semantic

similarity. So preprocessing for semantic similarity

computation in this implementation does not involve

removal of stop words and suffix. Further, each sentence

in the retrieved document is passed to a semantic role

parser to find semantic meaning of each sentence based

on frames (or verbs) in a sentence.

Semantic Role Labeling

Semantic role labelling, sometimes also called shallow

semantic parsing, is a task in natural language processing

consisting of the detection of the semantic arguments

associated with the predicate or verb of a sentence

and their classification into their specific roles. A

semantic role is "a description of the relationship that

a constituent plays with respect to the verb in the sentence".

For example, given a sentence like "Riya sold the

book to Abbas", the task would be to recognize the verb

"to sell" as representing the predicate, "Riya" as representing

the seller (agent), "the book" as representing the

goods (theme), and "Abbas" as representing the recipi-

5

ent. This is an important step towards making sense of

the meaning of a sentence. A semantic representation of

this sort is at a higher-level of abstraction than a syntax

tree. For instance, the sentence "The book was sold by

Riya to Abbas" has a different syntactic form, but the

same semantic roles.

In order to analyze user query and documents, semantic

roles of each sentence are computed by passing

these sentences through semantic role parser. This helps

in categorizing the documents based on their semantic

importance with user query. In iASSIST, NEC SENNA

is used as the semantic role labeler, which is based on

PropBank [9] semantic annotation. This semantic role

labeler labels each verb in a sentence with its propositional

arguments, and the labelling for each particular

verb is called a "frame." Therefore, for each sentence,

the number of frames generated by the parser equals the

number of verbs in the sentence. A set of abstract arguments

given by the labeler indicates the semantic role of

each term in a frame. In general, Arg[m] represents role

of term in given sentence where m indicates argument

number within sentence. For example, Arg0 is actor,

Arg-NEG indicates negation.

In general a given sentence is parsed into different arguments

by semantic role labeler with syntax as shown

below

Figure 3: Semantic Role Syntax

3.2.2. Sentence-Level Semantic Similarity(

SLSS) Computation and Top Relevant

Document Ranking

To assist users in finding answers relevant to their

query, once a new user query arrives, the retrieved documents

from search engine are required to be ranked

based on their semantic importance to the input user

query. In order to rank these documents, the similarity

scores between the retrieved documents and the input

user query are computed. Simple keyword-based similarity

measurement, such as the cosine similarity, cannot

capture the semantic similarity. Thus, this system uses

a method to calculate the semantic similarity between

the sentences in retrieved documents from search engine

and the user query based on the semantic role analysis.

Along with this, the similarity computation uses Word-

Net in order to better capture the semantically related

words.Figure 3 gives algorithmic design of SLSS Calculation

and top document ranking.

Algorithm:Sentence-Level Semantic Similarity

Calculation and Top Document Ranking

Figure 4: Sentence Level Semantic Similarity Computation

6

3.2.3. Document Clustering Using MDL

Principle

The identified top ranking cases are all relevant to the

user query. But these relevant cases may actually belong

to different categories. For example, if the user

query is "Give Information about Taj Mahal", the relevant

cases may involve Taj Mahal as Tea Brand, Taj

as Five Star Hotel or Taj Mahal as white marble mausoleum

etc. Therefore, it is necessary to further group

these cases into different contexts. The proposed system

makes use of Minimum Description Length (MDL) principle

in order to cluster documents with similar meaning

in one group. MDL Principle states that "Best model

inferred from a given data is the one which minimizes,

length of the model in bits and the length of encoding

of data, in bits." Figure 4 describes detailed document

clustering steps using MDL approach.

Algorithm:Document Clustering using MDL

Principle.

Figure 5: Document Clustering using MDL

MDL COST Equation

MDL Cost of C = _

_ ·(no.of1sinMTC+no.of1sand−

1sinM_) + |D| · log2 |D|

where _ and _ are computed using MTD matrix.

_ = P

x"0,1 −Pr(x) log2 Pr(x)

_ = Total no. of 1s in matrix MTD

Algorithm AggloMDL (D)

Begin

1. Let C = c1,c2,c3,.............,cn, with ci = ({di})

2. Select best cluster pair (ci,cj) from C for merging

and form new cluster ck.

(ci,cj ,ck) := GetBestPair(C)

3.while(ci,cj ,ck)is not empty do {

4.C:= C- {ci, cj} U {ck}

5.(ci,cj ,ck):=GetBestPair(C);

6.}

7.return C

End

procedure GetBestPair(C)

Begin

1.MDLcostmin := 1

2.for each pair(ci,cj) of clusters in C do

3.{

4.(MDLcost,ck):=GetMDLCost(ci,cj,C);

/*GetMDLCost returns the optimal MDLCost

when ck is made by merging ci and cj*/

5.if MDLcost < MDLcostmin then

6.{

7.MDLcostmin :=MDLCost;

8.(cBi

, cBj

, cBk

)=(ci,cj ,ck);

9.}

10.}

11.return (cBi

, cBj

, cBk

)

End

7

procedure GetMDLCost(ci,cj,C)

Begin

1. Dk = Di Dj ;

3. ck = (Dk);

4. C = C - {ci,cj} {ck};

5. MDL := Approximate MDL Cost of C

by MDL COST Equation

6. return(MDL,ck);

End

3.2.4. Clustering Using Symmetric Non-

Negative Matrix Factorization (SNMF)

Algorithm

Once, document clusters of similar context

are obtained by MDL clustering algorithm, in order to

generate summary by extracting important sentences

(extractive summarization), pairwise similarity between

sentences in document clusters needs to be considered.

For achieving this, sentences with similar meaning

(having more value of similarity) are grouped into

sentence clusters.

Most of the clustering algorithms deal with a

rectangular data matrix representation, i.e. either

document-term matrix or sentence-term matrix. If such

representation is considered, it will not capture pairwise

similarity between neighboring sentences. In this paper,

for clustering the sentences we use sentence similarity

matrix (sentence-sentence matrix) which better captures

the pairwise similarity. We use symmetric nonnegative

matrix factorization (SNMF) algorithm to find the

sentence clusters.

Non-negative Matrix Factorization (NMF)

It is a group of algorithms in multivariate analysis

and linear algebra where a matrix V, is factorized into

(usually) two matrices W and H such that:

nmf(V) ! WH

Factorization of matrices is generally non-unique,

and with a constraint that the factors W and H must

be non-negative,i.e., all elements must be equal to or

greater than zero. This factorization is carried out in

order to extract important objects. As,the input matrix

is symmetric, we use SNMF algorithm here.Stepwise

procedure to cluster sentences using SNMF is as shown

in figure 5.

Algorithm:Symmetric Non negative Matrix

Factorization(SNMF).

Figure 6: Sentence Clustering using SNMF

3.2.5. Summarization of Each Sentence Cluster

Once the sentence clusters are obtained there

lies a need to generate a concise summary by extracting

important sentences from these clusters. In order to

generate a reference solution this system performs multidocument

summarization to generate a concise solution

(summary) for each sentence cluster. The summarization

method we use is extractive summarization with main

focus on customer request (Request focused extractive

summarization).

While generating concise solution from multiple

documents some issues are raised like:

1. The information contained in different documents

often overlaps with each other; therefore, it is necessary

to find an effective way to merge the documents

while recognizing and removing redundancy.

8

2. Identifying important discrimination between documents

and covering the informative content as much

as possible.

To resolve aforementioned issues, we use the technique

of selecting only semantically important sentences

using a measure combining the internal information (the

computed similarity between sentences in the sentence

cluster) and the external information (the input query

by users).

Algorithm:Within Cluster-Sentence Selection

After grouping the sentences into clusters

by the SNMF algorithm,

1.Remove the noisy clusters (the cluster of

sentences containing less than three sentences).

2.Then, in each sentence cluster,rank the

sentences based on the sentence score calculation, as

shown in following equations.

The score of a sentence measures the importance of

a sentence to be included in the final

concise solution(summary).

Score(Si) = _F1(Si) + (1 − _)F2(Si)

Internal Similarity Measure :

F1(Si) = 1

N−1

P

Sj"Ck−Si

Sim(Si, Sj)

External Similarity Measure :

F2(Si) = Sim(Si, request)

Where F1(Si) measures the average similarity

score between sentence Siand all other sentences in

cluster Ck,and N is the number of sentences in Ck.

F2 (Si) represents similarity between sentence

Siand input request.

_(weight parameter)is set to 0.7 by trial and error.

High value of _ indicates more weightage is given to

internal similarity.

Table 3: Algorithm:Within Cluster Sentence Selection

After extracting the sentences from different sentence

clusters, a concise reference solution set is generated

to a given user query. A facility of visiting web

page/document from where sentence in summarized reference

solution is extracted is also provided.

Figure 7 shows flow of proposed system.

Figure 7: Flow of Proposed System

4. Results and Discussion

In this section the results obtained by iASSIST

for different user queries are presented. This section also

deals with performance analysis and comparison of iASSIST

with current helpdesk systems. All the classes in

proposed system were coded and compiled in the JAVA

1.7. The obtained results might slightly differ for other

settings. All tests were carried out on an Intel Core i3

CPU with 2.27 GHz Pentium processor and 4 GB RAM

under MS Windows 7 (64 bit) operating system. In proposed

system search engine results as case history for the

user query are used as data set.

In the set of experiments, randomly ques-

9

tions/queries were selected from different context and

search results returned by the search engine were used

as the dataset. During user survey, user is asked to manually

generate solution for the selected queries referring

the dataset. The sentences in the solution generated by

the user are considered as relevant sentence set.

In this section, some illustrative scenarios are presented,

in which proposed request-focused case-ranking

results are analyzed with user evaluated summarization,

which is assumed to have high accuracy.

Scenario 1: Give information about taj mahal.

Table 4 shows the concise solution generated by iASSIST

and manually evaluated summary respectively. For iASSIST,

the word "give" is a verb, and the corresponding

semantic role is "rel."Therefore, the cases related to the

keyword give will have less similarity score as compared

to the cases having actual information of taj mahal.

Scenario 2: The computer in the printing room needs to

add memory.

In this scenario(Table 5), search engine will take "printing"

as the keyword and return many cases related to

printing or printers as the search results. Obviously,

these are not the results which are useful to the user. In

iASSIST while ranking different cases, the semantic role

of the word "printing" is the location tag, which decides

that the cases related to "printing" will not be retrieved.

In this case, more importance is given to term"add" as its

semantic role is rel (verb).This helps in returning cases

which are related with how to add memory to computer.

Performance of proposed system is analysed

by comparing the solution generated by iASSIST

with standard automated summarization tool results.

Performance of iASSIST is measured using standard

IR measures:precision and recall

Recall = |Sman\SSys|

|Sman|

Precision = |Sman\SSys|

|Ssys|

Where,

Table 4: Top-ranking Summary Samples By Manual

Evaluation and iASSIST In Scenario I

Sman: Set of sentences selected by manual evaluation

Ssys: Set of sentences selected by iASSIST in final

summary

We assume, sentences selected by user while manual

evaluations are always relevant according to user perspective.

Thus, Sman are considered as relevant sentence

set.Table 6 shows precision and recall values for sample

user queries.Figure 7 and 8 show the average precision

and recall of the two techniques. Graphically, recall and

precision can be shown as in figure 9 and 10 for different

user queries. The higher precision and recall values

of iASSIST as compared to automated summarization

tools demonstrates that the semantic similarity calculation

can better capture the meanings of the user requests

and case documents returned by the search engine.

Comparison of proposed iASSIST system with current

10

Table 5: Top-ranking Summary Samples By Manual

Evaluation and iASSIST In Scenario II

helpdesk systems is shown in Table 7.

From the analysis, it is observed that the user

satisfaction can be improved by capturing semantically

related cases as compared to only keyword-based matching

cases. From the values of recall and precision obtained

for sample scenarios, we conclude that combining

the MDL principle that groups documents according to

different contexts and the SNMF clustering algorithm

can help users to easily find their desired solutions from

multiple physical pages. The problem of maintaining an

up-to-date history of past cases is solved by making use

of search engine as a database. Also, user can query any

problem related to any domain.

Table 6: Performance Analysis of iASSIST

Figure 8: Precision of retrieved solutions

5. Conclusion

The proposed system presents a new approach

to the problem of intelligent help desk system and addresses

the problem of search result summarization.The

proposed iASSIST system provides its users a single

point of access to their problems by providing solutions

from different domains. This system will automatically

find problem-solution pattern for new request given by

user by making use of search results returned by the

search engine. Use of semantic case ranking, MDL clustering

and SNMF with request-focused multi document

summarization helps to improve the performance of iASSIST.

In this proposed work, we presented a new technique

in which text documents can be clustered using

MDL principle. The basic idea of clustering using MDL

was applied for clustering the web pages and extracting

the templates. We adapted this technique in order

to cluster text documents returned from the search engine.

As the proposed system uses search engine results as

11

Figure 9: Recall of retrieved solutions

Figure 10: Precision Analysis for sample user queries

case history for the user query, the problem of maintaining

an updated case history for each and every problem

is automatically resolved.As compared to the keyword

based document similarity and summarization methods

the proposed method is efficient in extracting the semantic

information. This in turn contributes in improving

the overall result of summarization.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now