Various Categories Of Question Answering System

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract— Search engines can return ranked documents as a result for any query from which the user struggle to navigate and search the correct answer. This process wastes user’s navigation time and due to this the need for automated question answering systems becomes more urgent. We need such a system which is capable of replying the exact and concise answer to the question posed in natural language. The best way to address this problem is use of Question answering systems (QAS). The basic aim of QAS is to provide short and correct answer to the user saving his/her navigation time. The concept of Natural Language Processing plays an important role in developing any QAS. This paper provides an implementation approaches for various categories of QAS such as Closed Domain based QAS, Open Domain based QAS, WEBBASED QAS, Information Retrieval or Information Extraction (IR/IE) based QAS, and Rule based QAS which will be helpful for new directions of research in this area.

Keywords— Information Extraction (IE), Information Retrieval(IR), Natural Language Processing(NLP), Question Answering System(QAS),Search Engines.

Introduction

With the advancement in technology it becomes very easy to fetch the required information on a finger tip by using a single mouse click. Search engines are the biggest giants which are serving their users with their best efficiency but the drawback of using these engines is the result which they provide. Users need to search the exact information which suits their query from the multiple possible results that comes as a response of the fired query. The first QAS was developed in 1960’s.QAS can be developed for close domains like medical, education, construction etc or for open domain where the user can get answer for almost any Query. The systems developed in 90’s were mostly domain specific and they use NL interface to expertise their efficiency. On the other hand today’s systems mostly use various NLP Techniques. The basic motive behind the development of any question answering system is to provide the concise response to the question posed in natural language in order to save user’s time and efforts. Now a day’s amber of the information is available on internet which can be easily accessed. This information is suitable for users. But it becomes quite confusing and therefore it creates a problem for the computer applications to select the most suitable one from the list of response. Information extraction and retrieval is the most important application area for the QAS from various databases, WWW, various web sites etc. In this paper we focus on the survey about various types of QAS like close domain QAS, web based QAS, Information Retrieval or Information Extraction(IR/IE)based QAS, rule based QAS and open domain QAS.

2. System Overview

QAS is most important application of information retrieval. Basic motive is to retrieve correct answers to the given question posed in natural language from a collection of documents (such as the WWW or any local database collection).An efficient QAS requires more complex natural language processing (NLP) Techniques as compared with any other information retrieval system such as document retrieval, and hence it can also be called as the next step beyond search engines.[1][2] QAS research attempts to deal with a wide range of question types including: factoid, long answers, definition, how, wh-type questions, semantically-complex and multi-lingual questions.QAS are classified in two main types[14]:

- Open domain QAS

- Closed domain QAS

Open domain question answering deals with questions about nearly everything and anything because of its huge and strong world knowledge & general ontology. On the other hand, these systems tackle huge amount of data to extract the most relevant answer.

Closed-domain question answering deals with questions under a specific domain (for example medical or construction etc) and can be seen as an easier task as compared with open domain QAS because NLP systems can exploit domain-specific knowledge frequently formalized in ontology. Alternatively, closed-domain might refer to a situation where only a limited type of questions are accepted, such as questions asking for descriptive rather than procedural information. [1][2]

QAS consists of three main modules which plays a vital role in QAS. These are 1) Question Classification Module, 2) Information Retrieval Module, and 3) Answer Extraction Module. Question classification module plays a primary role in QAS to categorize the question based on its type.

2.1 Question classification module

Extracting answer in a large collection of database and texts, it is very important for a system to know what it look for. So it needs to classify the questions regarding their types. [4].broadly questions can be classified as factoid, long answers, definition, how, wh-type questions, semantically-complex and multi-lingual questions, hypo-sort of questions etc.(types of questions can be referred from table 1 in reference paper [6]).Classifying proper type of question is very important ,if it goes wrong then it will affect the working of other modules. Once this classification is done it derives expected answer types, extracts most relevant keywords, and reformulates a question into its semantically equivalent multiple questions. Reformulation of a query into similar meaning queries is also known as query expansion and it boosts up the recall of the information retrieval system. [6]

Information retrieval module

The mission of the IR module is to perform a first selection of paragraphs that are considered relevant to the input question. Information retrieval (IR) system recall is very important for question answering. If no correct answers are present in a document, no further processing could be carried out to find an answer. Precision and ranking of candidate passages can also affect question answering performance in the IR phase. This module can be understood easily from the fig 1[7]

IR systems:

Use statistical methods

Rely on frequency of words in query, document, collection

Retrieve complete documents

Answer extraction module

Answer extraction is a final component in question answering system, which is the tag of discrimination between question answering system and the usual sense of text retrieval system. Answer extraction technology becomes an influential and decisive factor on question answering system for the final results. Therefore, the answer extraction technology is deemed to be a module in the question answering system.

Information retrieval technique is take of identify success by extracting out relevant answer posted by their intelligent question answering system. Finally, answer extraction module is emerging topics in the QAS where these systems are often requiring ranking and validating a candidate’s answer.

Characteristics of QAS

QAS can broadly categories in two groups. The first one with various information retrieval methods and NLP while the other depends on reasoning along with natural language. These QAS carry their unique characteristics which are compared on different dimensions like techniques used, Domains, responses question that deals with and so on. In table-1 provides the details of the comparisons of these QAS. [5]

Close domain QAS

Closed-domain QAS works on a document collection restricted in specific subject and volume. This kind of QAS has some characteristics which makes it different from other categories of QAS specially open-domain QA, which works over a large document collection, including the WWW. In closed-domain QA, the database is quite small and specific to a targeted domain so whenever the query is fired correct answers may often be found in only very few documents; the system does not have a large retrieval set abundant of good candidates for selection. The QAS needs to answer for all types of questions whether it is simple or complex in order to use it as a question answering system for any company or organization. The system should return a complete answer, which can be long and

complex, because it has to, e.g., clarify the context of the problem posed in the question, explain the options of a service, give instructions or procedures, etc.

This makes techniques developed recently for open-domain QA, particularly those within TREC (Text REtrieval Conference) competitions (e.g.TREC, 2002) less helpful. These techniques aiming at finding short and precise answers, are often based on the hypothesis that the questions are constituted by a single constituent, and can be categorized into a well-defined and simple semantic classification (e.g. PERSON, TIME, LOCATION, QUANTITY, etc.). Closed-domain QAS has a long history, beginning with systems working over databases (e.g., BASEBALL (Green et al, 1961) and LUNAR (Wood, 1973)). Recently, research in QAS has concentrated mostly on problems of open – domain QAS, in particular on how to find a very precise and short answer. Nonetheless, researchers begin to recognize the importance of long and complete answers. Lin et al (2003) carried out experiments showing that users prefer an answer within context, e.g., an answer within its containing paragraph. Buchholz & Daelemans (2001) define some types of complex answers, and propose that the system presents a list of good candidates to the user, and let him constructs the reply by himself.

Harabagiu et al (2001) mention the class of questions

that need an answer in form of an enumeration (listing answer).

Open domain QAS

The main text for your paragraphs should be 10pt font. All body paragraphs (except the beginning of a section/sub-section The aim of an open domain question answering system is to respond to the user’s question. The reply is mostly a short texts rather than a lengthy list of relevant documents. This type of system makes use of multiple techniques from computational linguistics, information retrieval and knowledge representation for searching answers.

Like other types of QAS here also the query will be accepted in the form of question in natural language. First the type of question will be identified and then an Information retrieval system is used to find a set of documents containing the correct key words.A tagger and NP/Verb Group chunker can also be used to determine whether the right entities and relations are mentioned in the searched documents. To find the correct person or location for questions such as "Who" or "Where", one can use Named Entity Recognizer which provides correct answers from the retrieved documents or database. Later on the paragraphs which are relevant to the answer are selected for ranking.A vector space model [12] is a kind of model which can be used as a strategy for classifying the candidate answers. The system also needs to check if the answer is of the correct type as determined in the question type analysis stage. Several techniques can be used to validate the candidate answers. A score is then given to each of these candidates according to the number of question words it contains and how close these words are to the candidate, the more and the closer the better. The answer is then translated into a compact and meaningful representation by parsing. And finally the answer is passed to the user as a response to the asked query.

The most important challenge of an open domain system is its database. The efficiency of any system depends on how well the database is arranged and maintained. Especially for open domain QAS as it aims to answer merely for everything.

Web based QAS

Now a day’s internet is becoming the giant of information. Tremendous amount of information is available online making Web an ideal source of answers to a large variety of questions. The most important property of any web based QAS is its "snippet-tolerant" property which allows it to provide correct responses to the QAS while searching answer through search engines like Google, yahoo etc. whenever we pass a query to any search engine it will give a list of expected answers in the form of various web documents. These documents along with them usually carry the URL, the title, and some string segments of the related web document. These title and the string segments are nothing but "snippets". This "snippet tolerant" property is important for any web based QAS as it will be an online question answering system where the efficiency of system depends on the time required to download the wed documents, then it needs to analyze them. This time should be as less as possible. The user will submit the query to the system in natural language. First the system will identify the type of question later the system submits the question to the search engine and grabs its top search results. Each search result will be a snippet. The system may use a Support Vector Machine (SVM) [9] to classify the questions. After the question type has been identified, the system extracts all such type information from the snippets as plausible answers. For this one may use a HMM-based named entity recognizer [11] or any other technique as per required as well as some heuristics rules. For answer selection we can use snippet cluster [13]. After using the standard Vector Space Model It has been observed that the count for correct answer to the question is usually greater than the incorrect ones on the search results of that question. [13]. and finally the evaluation of the final answer will be done. Example of such QAS are LAMP system [10], ASKMSR [15] etc.

Information Retrieval Or Information Extraction (IR/IE) QAS

Question Answering, the process of extracting answers to natural language questions is not simply Information Retrieval (IR) or Information Extraction (IE). IR systems are used to locate relevant documents that relate to a query, but it fails to specify exactly where the answers are. IR uses query keyword matching approach to fetch the documents. These documents are indexed document collection. On other hand IE systems are used to extract the required information from the fetched documents provided the domain of extraction is well defined. Information extracted by IE systems is in the form of slot fillers of some predefined templates. The QAS technology is one step ahead from IR and IE systems. It uses both IR IE and provides exact and concise answers formulated naturally.[16]

There is a difference between IR & IE systems. IR system works on the interaction between human and computer when used to search the answer for posed query. The efficiency of IR systems depends on how well the machine is programmed in order to match the user’s query with the available documents to provide the most relevant documents.IR systems retrieves the most relevant documents from the available database but it alone cannot give the exact answer. And here comes the role of IE.IE systems are used for extracting the correct answer from the retrieved documents.IE systems uses various natural language processing technique to extract the answer. Both system demands the well arranged and maintained database.IR systems needs to face various challenges to prove its efficiency [18].

Rule based QAS

This kind of a system is one of the most important and efficient QAS. Its basic application is compression reading. Generally in United States the reading ability of children is evaluated by giving them reading comprehension tests. Now what do you mean by compression test? These tests means a small story is given to children as a paragraph. They need to understand it and requires answering the questions which are followed after the story. Children need to understand the aspect of story to answer the questions.

Understanding the story is easy task for children as compared to the computer system. Because at the end of the day computer systems are just an electronic device which need to be programmed for performing any required task. So when we want the computer system to go for compression test first we need to feed the program which will make the computer system understand the aspects of the story to answer the questions correctly. The program which will make this possible uses a concept of natural language processing along with the understanding of lexical and semantics heuristics which is difficult to achieve with broad-coverage techniques. These compression tests are quite difficult and challenging to be successful as it covers merely any topic.

Developing a rule based QAS is bit challenging task as the developer needs to consider virtually all the possible topics on which the system may get tested. At this level we all are very well familiar with the basic types of questions which can be asked [5]. The generally covered questions types would be WHO, WHAT, WHEN, WHERE, WHY. For rule based QAS the developer requires to consider each type as one separate group and need to implement separate rules for each one. This is because each type of question searches for different answers. For example WHO type of question search for PERSON NAME as an answer while WHERE looks for the LOCATION to answer the question correctly. Once the question is asked to rule based system the very first task is to parse it using a parser. Syntactic analysis is optional. After this the system would apply the NLP Techniques like morphological analysis, part-of-speech tagging, semantic class tagging, entity recognition etc. one can use hand crafted rules to get the correct answer from the given story. These rules are then applied to every sentence in the story including the title. Though the title is included it will not be considered for WHY type questions. Each rule awards a certain number of points to each sentence. The rules like dateline (for WHEN & WHERE type questions), wordmatch function etc can be applied on the sentences as per necessary. Once the rules are applied each rule will award some predefined value as a score. Finally the sentence whose score is highest is returned as the answer. Writing rules is also a tough job as there are N- numbers of ways to write them [19].



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now