The Brief Survey On Semantic Search Engines Computer Science Essay

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract— : The World Wide Web (WWW) allows the people to share the information (data) from the large databases. A databases grows as the amount of information increases. We need to search the information with specialize tools known as ‘search engine’. There are many search engines available today, but retrieving meaningful information is very difficult. So to overcome this problem of retrieving meaningful information semantic web technologies, are playing a major role. In this paper we present survey on the semantic search engines and the role of various search engines in intelligent web and comparison between different types of data.

Keywords— Information retrieval , Search Engine, structured data, semi-structured data, unstructured data.

Introduction

Semantic web search is a search engine for the semantic web. The semantic web will support more efficient discovery, and reuse of data. It also supports interoperability problem which cannot be solved with current web technologies. It is becoming increasingly popular to publish data on the web in the form of xml documents. Current search engines, are the tools for finding html documents, have two main drawbacks when it search xml documents. First, it is not possible to pose queries that explicitly refer to meta-data. Hence, it is difficult, and sometimes even impossible, to express a query that is having semantic knowledge. The second drawback is that, search engines returns references (i.e., links) to documents and not a specific fragments. Since, large xml documents may contain thousands of elements storing thousands of information. Since a reference to a whole xml document is not an useful answer, so the searching should be improved. Instead of returning entire documents, an xml search engine should return fragments of xml documents.

In this paper, we make a survey regarding intelligent semantic search engines and comparison between different forms of data.

Structured, Fully structured and Semi-Structured data

Unstructured data

Unstructured data is any corporate information that is not present in a database.  Unstructured data may consist of text or image data. Example of Text unstructured data is email , word documents ,PowerPoint presentations, etc.  Example of Non-text unstructured data is JPEG images, MP3 audio files and Flash video files. According to Unstructured Data as in [5] refers to information that either does not have a pre-defined  data model, hence it is a common way of filling information . ex. Document archived in folder , and that does not fit well  into  tables. Unstructured data may also be considered "loosely structured data" because the data sources do have a structure but all data within a dataset will not contain the same structure.

The advantage of unstructured data is, that additional effort on the classification is not necessary. A limitation of this kind of data is, that within unstructured content, controlled navigation is not possible.

A common technology to search an unstructured text documents is full-text search. The task of full-text search is that in response to a user's query it provides relevant information from many sources.  It is a simple approach which opens each and every document ( looking for each of the query terms). But  it can be time consuming. This approach is good for searching a small number of documents. Another solution is to compare the documents to each other using the inverted index and to choose the documents that are most relevant to the query. This approach increases the performance of a full-text search query. A famous full-text search engine library is Apache Lucene2

1.2. Fully Structured Data

Fully structured data follows a schema. Structured data is stored in relational databases with static classification systems, and also in separate documents. The structured data is identifiable because it is organized in a structure. The most common structured data is a’ database’ where the information is stored in a rows and columns. It is searchable by data type . Structured data is understood by computers and is also efficiently organized for human reader.

An advantage of relational database is there are several existing tools and web frameworks, to develop a database. This tool also maintains and manages the relational database systems.

Disadvantage is, it is difficult to add a new attributes or extend a database schema that already contains content.

2.3 Semi-structured data

Semi-structured data is a form of structured data that does not contains a formal structure like relational databases or like data tables, and nevertheless it contains tags  to separate semantic elements. It is based on labelled graphs. It enforces hierarchies of records and fields within the data. Therefore, it is known as schemaless or self-describing structure as in [7].

One of the advantages of semi-structured data is "... It has the ability to vary a structure" as in [7]. This means that according to the requirement, data can be created. A typical example of semi-structured data is XML.

Search engines for unstructured data

In full-text search engine, an index contains a list of all words occurring in each document and a set of references to the documents.  Such indexes are called "inverted indices as in [5]". These set of results are called "intersection sets".  The relevance of individual documents is determined using the ranking algorithm. Important criteria for ranking are the density of the keyword, the importance of individual words, and the distance between the query words.

The error-tolerance makes this searching process more comfortable and fast for users. So incorporating it in a full-text search in two ways:

Equivalence classes

Equivalence classes are built by using fixed rules to generate spellings for each word, the extra words are then stored in the index. Soundex is a phonetic algorithm for indexing names by sound. The goal is to encode the same representation so that they can be matched. The algorithm mainly encodes consonants. Soundex is the most widely known and is often used as a synonym for "phonetic algorithm". As in [11], Soundex algorithm evaluates each letter in the input word and assigns a numeric value. The main function of this algorithm is to convert each word into a numeric code, then eliminate the repeated codes & vowels. Then, it returns the first four characters of the resulting string.

Query rewriting

Query rewriting transforms the query, into the closest set of index words.  During query processing statistical similarity, semantic contexts and syntactic criteria can be incorporated to generate improved queries.  Modified queries are then matched against the index.  A common feature of both approaches is that they find similarity between terms in a first step and then proceed with an exact match in a second step.

Both equivalence classes and query rewriting methods performs searching on the index.

Various semantic search engines

There are many semantic search engines which have been developed and implemented in different environments, hence these mechanisms can be put into use to develop the present search engines.

Sara Cohen, Jonathan Mamou, Yaron Kanza, Yehoshua Sagiv presented a semantic search engine for XML (XSEarch) as in [12]. It has a simple query language, suitable for an end user. It returns semantic document fragments that satisfy the user’s query. Query answers are ranked using vector space model. Advanced indexing techniques were developed for efficient implementation of XSEarch. The recall, precision and the performance of the different techniques were measured experimentally. These indicate that XSEarch is efficient and scalable and give high quality results .

Wang et al.[13] present a semantic approach to automatically recognize tables in order to retrieve information from it ,with various layouts. They first tags table cells to identify the semantic relationship between table cells using domain knowledge, they then apply layout-syntax rules to transform tables into database format and then they retrieve the data by query languages. This approach denotes the layout by layout syntax grammar and match with given templates which can be used to analyze the semantics of table cells. Then semantic transformation is used to transform tables to database format.

Eser Kandogan, , Sriram Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu, Rajasekar Krishnamurthy presents Avatar Semantic Search [21], a prototype search engine that exploits annotations in the context of classical keyword search. Avatar Semantic Search executes a variety of annotators and stores the resulting annotations in a structured database called the annotation store. There are two parts to the Avatar Semantic Search engine:

1. Extraction and Representation

Avatar Semantic Search uses UIMA framework [4] to execute annotators. The UIMA framework allows us to define a workflow consisting of a chain of annotators. The Documents are fed in at one end of the workflow and the resulting annotations are available at the other end. The annotations produced are then persisted in an annotation store.

2. Interpretation

Avatar Semantic Search uses the notion of keyword query interpretation. Each interpretation, when executed over the annotation store, produces a set of documents as result.

Yuangui Lei, Victoria Uren, and Enrico Motta presents a Semsearch[10] ,which promises to produce precise answers to user queries that on one hand satisfy user queries and on the other hand are self-explanatory and understandable ,by taking advantage of the availability of explicit semantics of information in the context of the semantic web . It also supports complex queries. It provides means to understand the user queries and to translate them into formal queries. For that they find semantic entity matches for each keyword. SemSearch overcomes the problem of knowledge overhead by supporting a Google like query interface. It allows end users to specify queries in terms of queried subject and the combination of multiple keywords. SemSearch focuses on the user queries when generating formal queries. They have implemented a prototype of the search engine and applied in the semantic web portal of their lab.

Maciej Ceglowski, Aaron Coburn, and John Cuadrado [11] presents a graph-based algorithm for searching large collections of unstructured data, and it is implemented as a search engine to offer relevant information to users. This technique, uses a term-document matrix (TDM) to generate a bipartite graph of term and document nodes representing the document collection, where each non-zero value corresponds to an edge connecting a term node to a document node, thus the graph constructed is a contextual network graph. This graph can be searched by a simple recursive procedure, by energizing a query node and allowing the energy to propagate to other nodes along the edges of the graph based on a set of simple rules. Nodes that acquire energy above a specified threshold comprise the result set. This technique offers better performance as compared to latent semantic indexing (LSI) technique .

Acknowledgments

The Bhavana V. Kumbhare and S. J. Karale wish to thanks International Conference Advancement in Information Technology . This work was supported in part by the Department of computer science of engineering from yashwantrao chavan college of engineering



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now