Generic Text Summarization Using Word Net

Published Date: 02 Nov 2017

Introduction

In this article, we shall discuss the purpose of this project, in which brief detail about the project is mentioned. Other than this, the problem statement, which made us select this project.

Purpose:

The object of this project is to accomplish the academic needs. The time for the completion of this project is 1 year. The project is helpful for learning purpose. We shall discuss the scope of this project in later articles. Now the objective of this project? Letâ€™s see below:

Pedagogical Agent:

Our project, The Pedagogical Agent is a web application that searches the content in big data present in its repository and returns the required information. The search is performing on the basis of keyword, phrase, and sentence as all this text is processed on the basis of natural query processing. We can conduct interactive quizzesâ€™ on the articles that is returned by the search on big data. We can also provide summary of the content. The project is research base. At present, the searching is performed efficiently on the big data material and quizzes can be conducted. Our aim is to develop an application that to perform efficient search, give the user with the required data efficiently by natural query processing .Also gives the user with the facility of data summarization and make a system which utilizes the content of research papers, white papers, electronic articles and search engines contents interactively. The application mines and arranges content in a way that help user in extracting, understanding and learn any area. The format in which content is arranged i.e. Definition, Purpose (5wâ€™s), Detail, Implementation, Example and Summary. Apart from returning an efficient article in search, the application also provides following functionalities:

Word Meaning Walk around: While reading the article, the user may find difficulty vocabulary. The application will provide the meaning to user instead of referring to dictionary or web search.

Technical Terminology Understanding: While reading the article, user may come across a technical terminology, for which user doesnâ€™t have much awareness, so the application will provide the meaning to user.

Problem:

There are number of research papers, white papers, articles, available but, to find an article for a projectâ€™s topic within a subject then the present technology is of no help. The explosive growth in online information is making it harder for large, globally distributed organizations to foster collaboration and leverage their intellectual assets. The information about any topic on internet is so much scattered, even if you find the topic, by reading it you donâ€™t know that this all information is right and enough. The topic on internet mostly doesnâ€™t achieve clarity. This pedagogical Agent provides you the topic with clarity in which it covers five was. It recommends the searched topic in different aspects.

If you want to learn any topic on internet you have to switch through many different websites and it take too much time for you to cover full topic with full understanding...

Therefore, the need for our purposed system, enables use of electronic articles, and contents from all sources, efficiently, and create a meaning full article to the user(as required).

Structure of the Report:

This report would contain detail explanation of our project. This report also contain features that are concluded by surveys and analysis conducted to discover the need of reading and replicating human intelligence programmatically.

The important topics that report is supposed to cover are as follows:

Background and Literature Review: This article describes the current state of knowledge concerning the subject involved in the project. It also mentions the literature on the subject and highlights the information relevant to the activity undertaken in the project.

Aim and Statements of problem: This article mentions the aim of the project that should be clearly stated with sufficient explanation to make it easily understandable. It also defines the scope of the project.

Analysis and Design: This article mentions how requirements were captured, comparison of algorithms and data structures. It also describes the surveys performed for identifying the problems involved. The article also covers UML (Unified Modeling Language) diagrams. The purpose of using these diagrams was that they are useful to identify the functional requirements of the system.

Implementation: This article gives code details. It discusses the important aspects of code.

Testing: This article presents the test plan, how the system/program was verified.

Discussion: This article discusses the progression of the project including the methods used and the results of experimentation, or the design; in such a way that examiner can evaluate the worth of the project.

Conclusion: This section presents the concise statement of the conclusions which may be drawn from the work attempted. This article also discusses the uncertainties that have been remained in the project.

Background and Literature Review

"After presenting the project proposal, we started searching for the information related to our project. As the project is completely research base, we didnâ€™t come up with the project similar to our project but we also came up with number of research papers that proved quite knowledgeable to us, while analyzing and designing the system. Following is the information discussed:

Generic Text Summarization using Word Net

Various algorithms for text summarization have been developed, but none of the algorithms for text summarization selects sentences based on the semantic content of the sentence. In this research paper, the algorithm for summarization is based on identifying semantic relations. The Word Net is used to understand the link between the sentences.

Brief Introduction about Word Net

"Word Net (Fellbaum, 1998) is an online lexical reference system in which English nouns, verbs, adjectives and adverbs are organized into synonym sets or synsets, each representing one underlying lexical concept. Noun synsets are related to each other through hypernymy (generalization), hyponymy (specialization), holonymy (whole of) and metonymy (part of) relations. Of these, (hypernymy, hyponymy) and (metonymy, holonymy) are complementary pairs.

The verb and adjective synsets are very sparsely connected with each other. No relation is available between noun and verb synsets."[1]

Algorithm for Text Summarization using Word Net

The heart of the algorithm is the computation of global semantic information which captures the overall meaning of the text using Word Net. [1]

The algorithm has the following five steps:

1. Preprocessing of the text:

Break the text into sentences. Apply part of speech tagging to the words in the text. This is essential to pick the correct meaning of the word in WorldNet. Hence if the word "Pant" is used in the text as a verb, we will not associate it with a form of clothing. [1]

a. Identify collocations in the text. A collocation is a group of commonly co-occurring words, for example, "miles per hour". [1]

b. Identifying collocations helps in capturing the meaning of the text better than that of the individual words (just like any idiom). [1]

c. Remove stop words like "the", "him" and "had" which do not contribute to understanding the main ideas present in the text. [1]

2. Constructing sub-graph from WordNet: We find the portion of the WordNet graph which is relevant to our text, i.e. those synsets whose meaning occurs in the text. First we mark all the words and collocations in the WordNet graph which are present in our text. [1]

We then traverse the generalization edges of the Word-Net graph starting from these marked words, and keep marking all the synsets which are reachable from the marked words. We do a breadth-first search and traverse the graph only till a fixed depth, as the meanings of synsets become too general to be considered thereafter. Finally, we construct a graph G containing only the marked words and marked synsets as nodes and the generalization links between them as the edges. [1]

3. Synset Ranking: The basic motivation of this step is to rank the synsets based on their relevance to the text. So, if lots of words in the text correspond to the same synset, that synset or â€™meaningâ€™ is more relevant to the text, and thus, it must get a higher rank. This idea has been borrowed from (Ramakrishna and Bhattacharya, 2003), which details the use of Word-Net Synsets as a mode of text representation. [1]

4. Sentence Selection: The synset ranking algorithm gives us information about which synsets or meanings are more relevant to the text, the relevance being reflected in the rank of the synset. [1]

5. Final filtering: The last stage of our algorithm involves the application of simple heuristics to filter out the sentences which have undefined references. The heuristics applied are: removing sentences which contain words like "He", "It" etc. at the beginning and removing sentence which begin with quotes. [1]

Using Lexical Chains for Text Summarization

This research paper investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. It presents a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger, shallow parser for the identification of nominal groups, and a segmentation algorithm. Summarization proceeds in four steps: the original text is segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted. [2].

Lexical chains provide a representation of the lexical cohesive structure of the text. Lexical chains have also been used for information retrieval (Stairmand 1996) and for correction of malapropisms (Hirst & St-Onge 1998 to appear). Lexical cohesion is created by using semantically related words. This paper investigates how lexical chains can be used as a source representation for summarization. [2]

Algorithm for Chain Computing

"One of the chief advantages of lexical cohesion is that it is an easily recognizable relation, enabling lexical chain computation. The first computational model for lexical chains was presented in the work of Morris and Hirst (Morris & Hirst 1991). They define lexical cohesion relations in terms of categories, index entries and pointers in Rogetâ€™s Thesaurus. Morris and Hirst evaluated that their relatedness criterion covered over 90% of the intuitive lexical relations. Chains are created by taking a new text word and finding a related chain for it according to relatedness criteria. Morris and Hirst introduce the notions of "activated chain" and "chain returns", to take into account the distance between occurrences of related words. They also analyze factors contributing to the strength of a chain â€” repetition, density and length. Morris and Hirst did not implement their algorithm, because there was no machine-readable version of Rogetâ€™s Thesaurus at that time.

One of the drawbacks of their approach was that they did not require the same word to appear with the same sense in its different occurrences for it to belong to a chain. For semantically ambiguous words, this can lead to confusions (e.g., mixing two senses of table as a piece of furniture or an array). Note that choosing the appropriate chain for a word is equivalent to disambiguating this word in context, which is a well-known difficult problem in text understanding.

More recently, two algorithms for the calculation of lexical chains have been presented in (Hirst & St-Onge 1998 to appear) and (Stairmand 1996). Both of these algorithms use the WordNet lexical database for determining relatedness of the words (Miller ET al.1990). Senses in the WordNet database are represented relationally by synonym sets (â€˜synsetsâ€™)â€”which are the sets of all the words sharing a common sense.

For example two senses of "computer" are represented as: {calculator, reckoner, figurer, estimator, and computer} (i.e., a person who computes) and {computer, data processor, electronic computer, information processing system}. WordNet contains more than 118,000 different word forms. Words of the same category are linked through semantic relations like synonymy and hyponymy.

Polysemous words appear in more than one synsets (for example, computer occurs in two synsets). Approximately 17% of the words in WordNet are polysemous. But, as noted by Stairmand, this figure is very misleading: "a significant proportion of WordNet nouns are Latin labels for biological entities, which by their nature are monospermous and our experience with the news-report texts we have processed is that approximately half of the nouns encountered are polysemous."(Stairmand 1996)" [2]

"Generally, a procedure for constructing lexical chains follows three steps:

1. Select a set of candidate words;

2. For each candidate word, find an appropriate chain relying on a relatedness criterion among members of the chains;

3. If it is found, insert the word in the chain and update it accordingly.

An example of such a procedure is represented by Hirst and St-Onge (henceforth, H&S). In the preprocessing step, all words that appear as a noun entry in WordNet are chosen. Relatedness of words is determined in terms of the distance between their occurrences and the shape of the path connecting them in the WordNet thesaurus. Three kinds of relations are defined: extra-strong (between a word and its repetition), strong (between two words connected by a Word-Net relation) and medium-strong when the link between the synsets of the words is longer than one (only paths satisfying certain restrictions are accepted as valid connections).

The maximum distance between related words depends on the kind of relation: for extra-strong relations, there is not limit in distance, for strong relations, it is limited to a window of seven sentences; and for medium -strong relations, it is within three sentences back. To find a chain in which to insert a given candidate word, extra-strong relations are preferred to strong- relations and both of them are preferred to medium-strong relations. If a chain is found, then the candidate word is inserted with the appropriate sense, and the senses of the other words in the receiving chain are updated, so that every word connected to the new word in the chain relates to its selected senses only. If no chain is found, then a new chain is created and the candidate word is inserted with all its possible senses in WordNet." [2]

Building Summaries Using Lexical Chains

Now it has to be investigated that how lexical chains can serve as a source representation of the original text to build summary. The next question is how to build a summary representation from this source representation. The technique used is as follows:

Scoring Chain:

First of all, strongest chain produced by the algorithm is needed to be identified; therefore we need to identify the strength of the chain by following parameters:

Length: The number of occurrences of members of the chain.

Homogeneity index: 1 - the number of distinct occurrences divided by the length.

A score function for chains as:

Score (Chain) = Length _ Homogeneity Index

When ranking chains according to their score, we evaluated that strong chains are those which satisfy our "Strength Criterion": Score (Chain) > Average (Scores) +2 _ Standard Deviation (Scores) [2]

Extracting Significant Sentences

"Once strong chains have been selected, the next step of the summarization algorithm is to extract full sentences from the original text based on chain distribution.

Following are three alternatives for this step:

Heuristic 1: For each chain in the summary representation, choose the sentence that contains the first appearance of a chain member in the text.

Heuristic 2: For each chain in the summary representation, choose the sentence that contains the first appearance of a representative chain member in the text.

Heuristic 3: For each chain, find the text unit where the chain is highly concentrated. Extract the sentence with the first chain appearance in this central unit. Concentration is computed as the number of chain members occurrences in a segment divided by the number of nouns in the segment. A chain has high concentration if its concentration is the maximum of all chains.

All these three techniques extract only one sentence for each chain (regardless of its strength)." [2]

MIT project

Preface:

This document gives the brief description about the project, which is going to be done by the students named as William Billingsley and Peter Robinson of Cambridge University. The project is of 3 years duration named as "Pedagogical Agent".

The project aims to improve online education in the manner of reactive learning environment. The project purpose is to create an environment that consists of first year undergraduate exercises in which students write proofs in number theory because the automated proofs are considered to be difficult for novice users. The system asks the students to write out the proofs. The proofs are generates through Mathâ€™s-Tiles that resembles written mathematics

Unlike traditional syntax-directed editors, MathsTiles allow students to keep many answer fragments on the canvas at the same time, and do not constrain the order in which an answer is written. Also, the tile syntax does not need to match the underlying syntax exactly, and different tiles can be used for different questions. The exercises take place within the context of a Pedagogical Agent. The article responds with respect to the reader. The figure 2.4.1 (in appendix) shows the Math stiles

The Pedagogical Agent project aims to improve online education by designing materials that can model the subject matter they teach for example they design a question for the students of electronics that asks students to choose the values for current, voltage for transistor amplifier. At the backend there is an AI model. That checks that the values entered by the student are according to the electronics rules. For this purpose the students of Cambridge University aim to design a very special interface that helps students in learning more quickly with less training. They claim that novice users would also feel comfortable in using this system. For this purpose they would use human computer interaction research techniques, which reduce the learning barriers for the new users.

This is a web-based text article that contains AI supported exercises as well as extensible contents means that students can add different contents to the article and article may give different explanations to different users for the same topic.

The system asks the untrained students to write the proofs in number theory by their own instead of asking to the system. Although there are many systems that ask the student to write verifiable proofs in simplified domain such as predicate logic but it would be the first time when system asks the student to write verifiable proofs in number theory.

Goals:

The concept of the Pedagogical Agent is a web-based efficient search that contain AI supported exercises and content. Students are allowed to add further contents

And can make changes in it.

The system asks the student to write the proofs on paper instead of writing with its own. When the students write the proofs on paper, it should resemble to the rules of electronics.

The system would provide the interface like students doesnâ€™t need to write the algebraic expressions in hierarchal manner and they donâ€™t need to follow the top-down approach. They can start from anywhere and can link them all.

Architecture:

The question architecture for the time when student work on questions of a Pedagogical Agent related to electronics must support the electronically notations and diagrams. Other questions might be related to state diagrams, circuit diagrams or timing diagrams. Following diagram is being taken from the research paper written by William Billingsley and Peter Robinson. The 2.4.2 (in Appendix) shows the architecture of the system.

The system needs the server to check the studentâ€™s work. The student rather than writing on client writes on the remote document of the server. The student gets his copy and student starts writing on paper, the modifications are performed to the original document on server. The exercises take place within a web page. For this purpose the members would use the full HTML components that would be able to display diagrams and notations. The HTML page would dynamically be updated on server. JavaScript backed controls would be used to make the alterations on the client and server document. Content applets would be used to support diagrams. Content applets are built in internal architecture. When student make any changes in the document or write something on paper then it reports to the server. For communication between client and server the XML RPC applets would be used. Teaching scripts would describe how to respond to the student. Conversion scripts are responsible to process studentâ€™s document and input it to the external AI. The broker keeps a pool of processes to handle the requests. Itâ€™s needed when external AI is a process rather than a module.

Reordering sentences

This research paper focuses on reordering sentences of a coherent text. In this paper, a reordering algorithm has been developed to automatically reorder the sentences. This paper gave us an idea of designing such algorithm. The research question that this paper attempts to answers is "how good is a computer in reordering sentences compared to humans?"

Approach:

The first part of this research was thus dedicated to finding an existing solution to the problem of reordering sentences. If such a method was found, it would be applied and evaluated. After some thorough searching on the web, however, it quickly appeared that such an algorithm did not exist yet. The next step was to design an algorithm, which was to be tested and compared with human reordering. Dutch was chosen as language of the sentences to be ordered, since this is the native language of the persons performing this research. [3]

Theory

As a first step in designing methods was to give hints on sentence order, a simple text was shuffled and reordered by hand. While doing this, special attention was given to why sentences should have certain orderings. After some manual reordering, it appeared that (anaphoric) relations between pronouns and certain repetition vehicles were the key "guides" to human reordering. Furthermore, repetition, cue words and world knowledge appeared to guide humans in reordering shuffled texts. [3]

1 Anaphoric relation

Anaphora provides strong inter-sentential links. In (complete) discourse, each anaphora has a word to which it refers. In general, it can be assumed that this word is introduced before the anaphora is used. It is thus expected that anaphoric relations can help reordering texts, since human use these "anchors" when reordering sentences. [3]

2 Pronouns

Pronouns "refer" to words possibly in other sentences (or in the same sentence). When a simple text is shuffled, anaphoric relations between demonstrative pronouns and nouns provide relatively strong anchors. A demonstrative pronoun will almost always be preceded by a noun with the same features (cardinality, gender, etc.).[3]

3 Repetitions

Humans are able to recognize sentence order by recognizing repetition of certain elements in sentences. Work by [BN00] suggests that repetition "vehicles" support the temporal structure of discourse, focusing on document summarization. This repetition, however, seemed to be very difficult to aid in the automatic reordering of sentences, since it provided some kind of confirmation on that the sentences should be clustered together. Because this research deals on the reordering of simple texts, this was already obvious. [3]

4 Cue phrases

Cue phrase have a high potential in providing hints on the order of parts of texts, when used in a discourse sense. It is quite trivial to reproduce the order of cue phrases; however, they seem not to appear very often in the (small and simple) texts to which this research is targeted. [3]

5 World-knowledge

The last but certainly not the least vehicle for sentence reordering is world knowledge. Humans often know certain sentences should be ordered, without linguistic proof. An algorithm with only linguistic knowledge would not be able to (correctly) predict the order of these sentences. This method of providing knowledge on orderings is one of the most powerful vehicles a human would use, it is however the most difficult to model. [3]

Design overview

The general idea before the actual design started was to perform topological sort on a collection of individual orderings on sentences (partial orderings), i.e.: if some method provides knowledge that sentence x precedes sentence y, sentence x will be ordered before y. The algorithm (algorithm-1 in Appendix) will combine all partial orderings and perform a topological sort on the union of those orderings. [3]

Topological Sort

A topological sort is an ordering of the vertices of a directed acyclic graph given by the following definition [PE97]:

Consider a directed acyclic graph G = (E: V). A topological sort of the vertices of G is a sequence S = {v1: v2: . . . : v|V|} in which each element of V appears exactly once. For every pair of distinct vertices vi and vj in the sequence S, if vi -> vj is an edge in G, i.e., (vi, vj) â‚¬ E, then i < j.

Individual sentence orderings were modeled as a set of edges: si -> sj is an edge in the partial sort if and only if sentence i precedes sentence j. The actual sort is performed by the following steps:

1. Initialize the total sort as an empty set

2. Repeat until no nodes left

3. Pick a node having no incoming edges

4. Append this node to the total sort

5. Remove the node from V

6. Remove all edges from E originating in the node

7. Continue with step 2

A problem was, however, that the partial ordering modeled by graph G should be acyclic, which is not guaranteed yet. If graph G contains cycles, step 3 is not possible.

Conflicts and problems

A topological sort is possible if and only if the partial sort is a graph containing no cycles. Any cycle will cause the topological sort to error, so some kind of conflict resolution had to be applied in order to provide a generic and robust sorting framework. In order to resolve conflicts, cycles had to be detected, which was done by a technique called tree-visiting, inspired by [Par03]. When a node is visited, it flags itself as being visited, visits its children (if applicable) and flags itself as done visited. A node whose flag is being visited which is visited again will prove the existence of a cycle in the graph. This technique can be expanded to point out the path "causing" the cycle instead of just proving its existence. This technique was applied from each node of the graph, since the graph consist of more than one component (i.e. a "forest"). The result of the step is a set of paths causing conflicts, which were to be addressed next. This way of identifying cycles is not the most efficient method, but since the amount of sentences (thus nodes) is relatively low, this will not harm the overall resource consumption of the algorithm. [3]

Resolution

Now that the conflicting (cycling) paths had been identified, something had to be done in order to eliminate this conflict. The edges representing a partial ordering where enhanced with a value indicating a certain confidence level. When a path causing a cycle has been identified, the weakest edge (the one with the lowest confidence) was located in this path, and marked for removal. In a final phase all edges marked for removal were removed from the partial ordering. This technique was repeated until the graph was reduced to an acyclic one. [3]

Hinters

Partial orderings are to be provided by "Hinters", modules (or plug-in) which provide a set of orderings given a set of sentences. The program can have as much hinters as desired, merging the results together before performing the final topological sort. Since all hinters have a common structure, they will be made in a generic structure. [3]

Pronouns

To discover a possible relationship between a demonstrative pronoun and a noun, a naive algorithm was designed. In the first phase, the algorithm collects possible (demonstrative) pronouns and nouns, associating them with an identification key to the sentences in which they were found. In a second phase, the "candidates" are matched against each other. [3]

Repetition

In order to use the knowledge of repetition guiding sentence order, repeating nouns are detected and matched against each other. When a noun occurs in sentences si and sj, it is assumed that when i != j either i -> j or j -> i is an edge in the partial sort. [3]

Pronoun/noun resolving

The design was implemented in JAVA and was tested on sentences of Dutch language. Several classes were created. (See Algorithm2 in Appendix)"

Aim and Statement of Problem

Problem Statement:

If you want to learn any topic on internet you have to switch through many different websites and it take too much time for you to cover full topic with full understanding...

Therefore, the need for our purposed system, enables use of electronic articles, and contents from all sources, efficiently, and create a meaning full article to the user(as required).

Aim of the Project:

The project, Pedagogical Agent, aims to provide facility to create dynamic articles. As this project is research based. The project aims to deliver these objectives:

The agent will provide reader with the facility to; generate an article about any topic.

The agent will make use of e-articles, research papers, white papers and search engines contents efficiently.

The content from e-articles, research papers, white papers and search engines shall be present in agent in systemâ€™s repository.

The agent will generates contents in a manner which helps the reader in understanding any topic, i.e. the text extracted shall be organized in local format i.e. Definition, Purpose, Detail, Implementation and Example.

The agent will help in saving time of user from different domains specially students and teachers.

Analysis and Design

Scope of Project:

Here we shall, discuss those tasks/functionalities that shall be included in project. These tasks/functionalities donâ€™t have any concerned with the time and cost. We shall express the scope definition in terms of following heading:

Deliverables

Functionality

Technical Structure

Deliverables:

In deliverables, we shall define both external and internal deliverable:

External Deliverable: External deliverables are things the project delivers to the usersâ€™ e.g. screens and reports.

In case of our project, the software (application) and project documentation report and a user manual shall be delivered to the user

Internal Deliverable: Internal deliverables are things the project generates internally:

In case of our project, the project shall generate minutes of the meetings, test plan, project plan.

Functionality:

After the detail provided for deliverables, now we define the functionalities of agent that ensure to accomplish. Apart of it, certain limitations have also been discussed. The main features of system are as follows:

Article Content Generation

Summarization

Explore Word Meaning

Definition of Technical Terminology

Quiz Conduction

The major features are "Dynamic Article Content Generation and Summarization". Following tools and techniques are used for this project:

We may design our own algorithms for the above features.

Visual Studio 2010 using Asp.net, C#, SQL Server 2012.

Word Net Dictionary.

We are also expected to release a research paper of this project.

Limitations:

Some functionality associated with the proposed features, that we may not able to complete due to shortage of time. We aim that in our project we implement reordering of text generating in readable format and provide the user with the facility of Summarization tool. But reordering text and summarization of content in intelligible form is hectic job, and also we didnâ€™t find any appropriate literature for it. We shall try to design our own algorithms for this purpose.

Technical Structure:

Here technical structure of the agent is illustrated in terms of context diagram, by this you will have better understanding of functioning of agent.

Figure : Technical Structure

Requirements Capture

Requirements are the conditions that systemâ€™s analyst must follow to achieve certain objectives. The objective of the project is to build such an application that solves the searching facility with using natural query processing techniques that will able to help summarize the searching ability of the user. There will be no need for the user to cater out results from different places because of the natural query processing technique which will shorten our searching ability and will provide us with the actual summarized data which the user requires. This will be beneficial for anyone who needs to understand a topic given multiple options to read from and also for those users which gets confused of getting many outputs against their search as today`s world is not specific about the data which the user actually wants.

Fact Finding Techniques

Requirements for the project are usually gathered through five main facts finding techniques that are used by analysts to investigate requirements, are following:

Interviews

Surveys

Observations

Research

Interviews

There were several formal interviews conducted by the team with the supervisor, in order to gather the requirements needed for an application. These interviews helped us in well understanding the central part of the requirements of the supervisor which helped us the way application should look like. Apart from interviews the team also conducted a research. The research was done under the supervision of our supervisor, Mr. Faiz-Ul-Haq Zia, in Bahria University, Karachi Campus, where students of Bahria University, Faculty Members and other professionals gave their best opinions and views on the subject of "Pedagogical Agent". The research helped the team to do the analysis in an efficient and in more effective manner. The purpose of the research was to identify the

Needs of the application and fully understand how the application will help users to optimize their search results and how the users will be beneficial with the search. After the research the results were examined and final features of the system were identified that the system would make use of such a file system which will exactly perform like Lucene. What actually Lucene does is it also creates its own file system which does not have any relation with each other. The use of making own file system was because of the big data. In such a system where the use of big data is very effectively adapted by every other organization. So, we will also be creating our file system and other features include help in summarizing the search of the word and definition of technical words.

Research

To develop a new system team were required to do the observation/literature research. The literature research of the project yes then what can be different thing that must be needed to corporate into a current project to make it different from others. For it members have to do literature research. The research required the team to get deep into the Google and search for a file system engine. Then with some satisfying results, we started the project.

Conclusion

Therefore, with the help of this fact finding techniques the members decide to build a system named as Pedagogical Agent which would search and summarize the searched content according to the reader`s desired topic by gathering the information from the repository. Unlike other search engines pedagogical agent will not provide thousands of references, it just automatically gathers data from multiple sources and arrange it in a format which will help the reader to understand the topic in a better way and summarize the content. This will save the time of the reader, the agent doesnâ€™t require readers to arrange the document in and then understand it.

The content generated will be organized in confined format i.e. Definition, Explanation and Example.

Feasibility Study

A Feasibility Study is a process which defines exactly what a project is and what strategic issues need to be considered to assess its feasibility, or likelihood of succeeding. Feasibility studies are useful both when starting a new project, and identifying a new opportunity for an existing project. The feasibility study analysis helps in identifying the problem and prospects, determining and describing objectives, current situations and possible outcomes. The main objective of the feasibility analysis is to find out the right direction that how much the required project is feasible in terms of technical, operational and economical.

In order to ensure that the pedagogical system will be beneficial in the company, a feasibility study is conducted.

Economic Feasibility

The purpose of assessing economic feasibility is to identify the financial benefits and costs associated with the development project. Economic feasibility is often referred to as a cost-benefit analysis.

The pedagogical agent results in a certain amount of benefits that will affect the company as well as some unfavorable effects called costs. The pedagogical system will have some benefits, which are as:

The readersâ€™ problem of getting organized and arranged material will be solved.

The readers will get their required material on particular topic within few minutes.

In the long run the pedagogical system will be the most cost effective solution for the readersâ€™ problems.

Pedagogical system allows reader to find their required material as well as the meanings of difficult words.

The reader will get the output in summarize form.

Technical Feasibility

In technical feasibility team members analyze that the project on which the members are working is technically feasible or not that is here the members are going to analyze that what technologies are required to develop the project, the availability of the required technology, and the training of that technology i.e. to identify whether or not the members are familiar to use the tool.

Does the technology exist to be able to create the pedagogical system?

The technology which will enable the members to create the pedagogical system is readily available. The software needed is very common and the team members have the expertise to create the application. The application will be developed using Microsoft Visual Studio.Net 2012 framework 4.5.

What software is required to develop the pedagogical system?

The software that will aid the team in developing the new system is easily obtainable and quite basic.

Microsoft Visual Studio 2012

Microsoft Word â€“ to enable documentation

Notepad â€“ to save the material in a simple text format

Is there any potential growth of the system?

If any developer wishes to extend the capability of the system, any professional would be able to make those extensions. Additional technical documentation should also be one of the aims, so that a professional will be able to understand what has been done, and this would enable future developments and growth.

Operational Feasibility

The purpose of conducting an operational feasibility is to gain an understanding of the degree to which the system will likely solve the readerâ€™s problems or take advantage of its opportunities. The assessment of operational feasibility includes an analysis of how the system could help readers in optimizing their problem of getting arranged and organized document on a particular topic

Will the users adopt the working of the pedagogical system?

The pedagogical system is very user friendly and responsive. The user will find it very easy to work with.

Will there any training required for using the pedagogical system?

Any user of the system may need to receive a critical outline on how to operate computers to get them familiarized with the machine. Apart from this, a step by step training plan may need to be devised to explain each aspect of any system.

But for using the pedagogical system the reader is not required to be specially trained.

Stakeholders Analysis

People involved in the project directly or indirectly are termed as stakeholders. Stakeholder means a person is directly or indirectly involved in the success and failure of the project. A stakeholder in an organization is any group or individual who can affect or is affected by the achievement of the organizations objectives.

In general, any computer system consists of following stakeholders:

Those responsible for design and development;

Those with a financial interest, responsible for its sale or purchase;

Those responsible for introduction and maintenance;

Those that have an interest in its use.

There are two types of stakeholders internal and external. Internal Stakeholders are those people who areÂ â€˜membersâ€™ of the business organizationÂ

Project Managers: It is the one who plan, motivate, organize and control the practitioners who do software work .In our project our project manager is Mr. Faiz-ul-Haq Ziya

Practitioners: These are those people who deliver technical skills that are necessary to engineer the product or application. All of us team members are the practitioners of the project

Customer: Customers is one who specifies the requirements for the software to be engineered. As the members are developing the project as per degree requirement so Mr. Faiz-Ul-Haque Ziya (Supervisor) is the customer of the project.

End user: As the application is developed on the requirement of the degree purpose and is considered as our final year project, our end users will be our respected teachers, students and other staffs.

External stakeholders

External stakeholders are those who are not part of the system and played a key role in guiding as a proper path.

Which team organization structure we are using?

Each member is given equal opportunity to share their views and suggest solutions. Major decisions are made by Mr. Faiz-Ul-Haque Ziya (Project Supervisor). The tem team organization structure which has been followed is a decentralized structure.

System Context Diagram

Below is the context diagram for pedagogical system.

Figure : Pedagogical Agent Context Diagram

The activities in the above Context Diagram are shown below:

The Pedagogical System shows the stored data and then a reader selects file to read.

A reader enters word to find meanings through Word Net Dictionary and Pedagogical System displays the meaning.

A reader enters word to find definitions through Word Net Dictionary and Pedagogical System displays the definition.

System Use case diagram

A use case defines a goal-oriented set of interactions between actors and the system under consideration. Use cases capture who (actor) does what (interaction) with the system, for what purpose (goal), without dealing with system internals. A complete set of use cases specifies all the different ways to use the system, and therefore defines all behavior required of the system, bounding the scope of the system.A use case describes a sequence of actions that provide something of measurable value to an actor and is drawn as a horizontal ellipse.

Figure : System Usecase Diagram

Use case description -

The above diagrams describe the functionality of Pedagogical System. Use cases are represented by ovals and the actors are represented by stick figures. The Reader actor can Start Application, Open a single file (and optionally open multiple files), Finds Meaning, Finds Definition, and Enters title to generate a Dynamic / Dynamic data.

On the other hand the Pedagogical System Actor stores the documents, takes input from the user, and displays meaning, Displays Definition, and Generates dynamic with respect to search.

Network diagram

A project network diagram is a graph that represents the sequence of project activities that are needed to be completed and their dependencies.

In the following the total working days of the project are calculated by considering the start and end date of the project.

The results are shown below:

Start Date -1/Oct/2011

End Date- 15/July/2012

Oct 31

Nov 30

Dec 31

Jan 31

Feb 28

Mar 31

April 30

May 31

June 30

July 15

-------------

288 days

-------------

System Design

Object-Oriented Design

Object oriented design methods emphasize the proper and effective structuring of a complex system. Object oriented design is a method of design encompassing the process of objects oriented decomposition and a notation for depicting logical and physical as well as static and dynamic models of the system under design. [BOO03]

Comparison of different choices for algorithms and data structures.

To build the system and meet the system requirements various designs and algorithms were considered.

Format of article-

Option 1. RDF/XML-

Introduction-

It stands for Resource Description Framework. It became W3C Recommendation in February 2004. RDF documents are written in XML. The XML language used by RDF is called RDF/XML. This format was one of the options for natural language processing.

Uses-

It targets a number of important areas that include dynamic syndication and personalization, mobile devices, resource discovery, intelligent agents, Describing content for search engines, content rating, intellectual property rights, and privacy preferences, Describing information about web pages, such as content, author, created and modified date.

Architecture [1] â€“

Step 1- Gathering subject/topic metadata

HTML keywords

Subject lines in email messages

Terms extracted automatically from text using natural-language-processing algorithms.

Step 2- Process on collected data

Normalizing input data (e.g., Web page files) by removing HTML tags and other programming codes

extracting common terms from the text

analyzing relationships among extracted terms

filtering terms and relationships contextually

Structuring filtered terms for representation in an RDF graph.

Working-

The subject/topic extraction software is embedded in a library of Open Source code that:

Harvests Web pages from a list of URLs supplied by the user

extracts simple metadata and encodes it in RDF

normalizes the text for the NLP component

creates a database of RDF relationships

Makes the results available to the user through user interface.

RDF Model-

See figure 5 in Appendix

Issues-

RDF model is very powerful and useful for maintaining semantic relations and extracting contents. Team members decided to search for an alternative format to work on because RDF is a new upcoming format. Documents in RDF format are still not available. The team members decided to create an HTML/Text to RDF converter, and make use of its semantic link for extraction. Converter was developed by the team members but it dint show satisfactory results, because to make use of RDF document it needed to be developed following some standard ways of writing statements. For this purpose one should have a good idea about what is being said about what, and they would tend not to arise. Maintaining subject and object relation between entities was itself a challenge. Team members needed to have a clear understanding of the RDF document structures, schemes and way they are built and interpreted. This idea was dropped because of expected risk of meeting the project deadline, and it may divert from the actual project scope.

Option 2. Jrju Script-

Introduction [ref]-

Jrju was designed for speed and simplicity. Most computer users understand how to use plain text to express them (email for example). Jrju ups the ante only slightly, adding a hierarchical structure to a collection of plain text notes. Consider the following:

Plain text is universal. Notes can be taken anywhere, anytime, on any type of computing device.

Jrju scales well. Note articles grow in complexity as more content is added. Memory available to run the program is the only limitation on note article size.

Jrju takes over the burden of creating and maintaining hypertext links. Hypertext authoring becomes more natural and robust.

Jrju was written in Perl. Perl interpreters are available for most computers at no charge.

Uses-

The major strength of Jrju is its ability to create hypertext links automatically. A link will appear anywhere you refer to a page by name in the text. For example, "JTX File Format" becomes a link since there is a page by that name elsewhere in this note article. The phrase "table of contents" is special and will always link to the top level index page for the note article.

A special "links file" allows you to define hot words with links outside the note article. For example, the Medical Informatics Home Page will be linked to its URL and my name, Richard Rathe, will be linked to my email address.

Finally, Jrju keeps track of inverse links for each page. An inverse link is a link into the current page from another page.

Working-

Jrju will recognize files whose names end with ".jtx" (for Jrju Text). (The JTX file for this page is called "files.jtx.") Each JTX file must have the following minimal format:

The page name on a line by itself

---- A blank line ----

The page content

A note article using this format will have a single table of contents index listing each page alphabetically.

To create index hierarchy, add one or more lines above the page name and before the first blank line. For example:

List of Members

Bill Clinton

All the pages with "List of Members" on the first line will be indexed on a page titled, "List of Members." You can add an arbitrary number of "layers" to the hierarchy. The only restriction is that each page and index must have a unique name. For example: (See figure 6 in Appendix)

Issues-

Team members decided not to opt for this as members dint has prior knowledge of Perl language. There was risk of failing to implement the system using this as the members hesitated to work on it. Keeping in mind the project scope and duration and a serious risk of lack of confidence the members decided to look for another alternative.

Option 3. Text format-

The members discussed and researched on the above mentioned alternatives. Members found Text format as the most common and easily implementable format to work on. Documents such as PDF or HTML can be easily converted to text documents. The users can easily make use of freely available converters and may use "Pedagogical Agent", and fulfill their need.

Conclusion-

The team members decided upon Text Format. The users of the system need to convert their documents to Text format in order to make use of the system. Next phases will be designed with the constraint of documents being in text format.

Layout of Dynamic Content Generated-

Option 1. User requirement based-

Overview-

This approach considered according to which the system would initially ask the user to select the subject, then enter topic name. Then the system would identify the subtopics and present the list to the user. Next the user will select the order in which he/she wants to arrange the document topics or contents and read it.

Working-

The user will select the document, from the Menu bar click on File->Open->Computer Architecture.

Next the user will enter the topic name

The system would read all the files of computer architecture available, then extracts the list of topic available and displays it.

Next the user will enter the desired sequence of topics in which he/she wishes to arrange and read.

The contents will be generated from all the available articles in the repository.

Issues:

This approach was dropped as it seemed to be very trivial. The fundamental natures of dynamic content dint seem to meet. As this design (See figure 7 in Appendix) was just mere ordering of contents and its topic. The design dint requires any intelligence to generate the content.

Option 2. Definition- Explanation- Example- Reference

Overview-

On the basis of survey conducted in the analysis phase, most efficient and easily understandable layout was considered. This layout was Definition, Explanation, Example and Reference.

Definition head-

Sentences which would describe the topic, its overview and briefs what the topic is about.

Explanation head-

Sentences which are directly or indirectly related to the topic are extracted. This means this part of DCG would give details about the topic by extracting only relevant sentences from all the documents available in the repository.

Example head-

Sentences which would give examples related to the topic will be extracted and displayed.

References head- This will just display the references of the documents from where the Dynamic content is generated.

Option a. Working â€“ Dynamic Article

The user will select the subject. The system will read all the documents available and generate dynamic article. The contents will be generated by reading all the files and extracting appropriate sentences for each head identified.

Option b. Working â€“ Dynamic Article

The user will select the subject. The system will ask the user to select the articles from which he/she wants to generate the dynamic content. The system will then extract sentences only from the files the user has specified (number of files selected should be greater or equal to 2). The contents will be generated by reading all the files and extracting appropriate sentences for each head identified.

Conclusion-

"Option2. Definition- Explanation- Example- Reference", was finalized for Dynamic content layout. As this layout seemed to be logical and comprehensible way to ease the reader and facilitate in grabbing the concept of the topic quickly. It was decided among the group members to design and implement both options i.e. Dynamic article and Dynamic Article.

Abstraction and Separation

Main classes and objects of the system abstracted-

Dynamic Article Generation

Selecting subject

Choice of selecting articles to generate dynamic article from.

Dynamic Content generated inform of-

Definition

Explanation

Example

References

Reordering Sentences â€“Using Majority order

Meaning of difficult (English) words.

Definition of technical words.

Dynamic Mini-Article Generation

Select subject

Generate Dynamic Mini-Article from all articles available in the repository

Dynamic Content Generated inform of-

Definition

Explanation

Example

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now