The Mining Techniques Using Semantic Web Technology

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract:

The numerous public data resources make integrative bioinformatics experimentation increasingly important in life sciences research. With the explosion of online accessible bioinformatics data and tools, systems integration has become very important for further progress. Currently, bioinformatics relies heavily on the Web. But the Web is geared towards human interaction rather than automated processing. The Semantic Web’s vision is to facilitate this automation by annotating web content and by providing adequate reasoning languages.

Introduction:

People repeatedly go to a number of sites and download information to do their work. The process is labor intensive- it requires opening a new browser session with each site. Frequently data must be cut, reformatted, and pasted into another application for it to be really useful [1].

The Semantic Web approach greatly simplifies this process. "With the Web today, if you need data from 10 sites, you need to go to all 10 sites and cut and paste the data to get an integrated view," "Semantic Web pushes this job of assembling data out from the desktop into the network. With the Semantic Web, the network knows how to get and assemble the data."

Information is often distributed across multiple systems and recorded in a way that makes it difficult to piece together the complete picture. Differences in data formats, naming schemes and network protocols amongst information sources, both public and private, must be overcome, and user interfaces not only need to be able to tap into these diverse information sources but must also assist users in filtering out extraneous information and highlighting the key relationships hidden within an aggregated set of information.

A Semantic Web browser can be configured to go to multiple sites, find the specific information required, retrieve this information, and display it in a single Semantic Web browser. In essence, this application of Semantic Web technology is a kin to a next-generation portal. Such capabilities make the Semantic Web very interesting to life science organizations [2].

The Semantic Web is an evolving collection of knowledge, built to allow anyone on the Internet to add what they know and find answers to their questions. Information on the Semantic web, rather than being in natural language text, is maintained in a structured form which is fairly easy for both computers and people to work with.

Objective:

The objective is to create the core of a Bioinformatics Semantic Web populated by a number of sample data sources and applications representative of the use of the Web in Bioinformatics and to demonstrate novel, reasoning-based solutions dealing with the following problems:

Rules for mediation and to formulate complex queries

Consistent integration of Bioinformatics data

Adaptive portals for molecular biologists

Bioinformatics is an ideal field for testing Semantic Web technologies for three reasons:

Web-based systems and Web databases have been applied very early in Bioinformatics,

the dramatic increase of data produced in the field calls for novel processing methods,

the high heterogeneity of Bioinformatics data requires semantic-based integration methods.

Methodology: [3]

Connections are established between the ontology system and any databases, spreadsheets, or other systems that hold relevant information for that modeling problem.

The ontology is created using RDF/OWL, [4] and an interface built to allow domain experts to edit the ontology.

Libraries are created in a partnership between ourselves and domain experts.

Taxonomies are populated by model builders who want to use them for their modeling problem.

Taxonomies are color coded for ease of understanding, this part of the diagram was built with Vanguard system. Created a link between the ontology tool and this decision support and calculation tool. Vanguard system reads information from the ontology tool.

There are 2 sorts of constraints that can be used in order to make it easier for users to build and adapt models. These are constraints the ontology, and models are built, and user interface constraints to reduce the scope for error.

The color coding makes calculation clearer because all taxonomies can be used in any calculation, this results in a multicolored result tree that represents the entire calculation history. User choices affect how items are related for the calculation; choices could be made manually or via a search. Color can also be used to represent cost, time, or uncertainty.

Each node can also represent uncertainty, and we have prototyped including uncertainty expressions in the calculations.

The result tree can be represented on the web and in other programs, this allows for further searching, processing and evaluation of results. Visualization techniques and the use of searchable languages such as XML, and SVG can assist.

Experts such as designers can interact with the ontology, the model, and results, it's intended that there will be a two way feedback mechanism where the expert can make changes at any stage, and this filters into changed results. This can then support a cycle of results and rework.

Converting Bioinformatics data type into RDF/OWL format.

Integration of Databases using Semantic Web tools.

Software that is been investigated for representing ontologies and translating to program code and visualization is Stanford University's Protégé [4], Jena [5], and Kaon [6]. Applications that are built with ontology tools such as the above and include a development environment for calculation and decision support are Metatomix m3t4 [7], TopBraid Composer [8], and General Electric's ACUITy enterprise modelling tool [8]. These tools include Java Eclipse extensions for high level programming. We have also investigated transformations that can translate the ontology into representations in other languages and tools. We have prototyped this translation for decision support tools Vanguard System [1] and Cost Estimator [9], and languages including XML (eXtensible Mark-up Language), and Java. XML has mainly been used as a neutral format for representing information, but its rich structure makes it suitable for use as a programming language e.g. AspectXML [10]. Further research can be undertaken into representing the information in Meta languages such as metaL [11] and Simkin [12]. The result documents could be searched using XQuery within Exist [13] and SPARQL (Simple Protocol and RDF Query Language) [14] and edited using XForm editors such as Orbeon XForms [15].

Bioinformatics Databases to be integrated:

Genomics and Commercial Databanks

NCBI

Nucleic Acids Research Database

BioDirectory - BioWiki

CLC Bioinformatics Database

Biology Links

Databases for Molecular Biology

Nucleotide Sequence Databases

GenBank

EMBL Nucleotide Sequence Database

DDBJ

Human Genome Sequencing Center at Baylor College of Medicine

IMGT

Protein Sequence Databases

ExPASy (Expert Protein Analysis System) proteomics server

UniProt

PIR Protein Sequence Database

Swiss-Prot

iPROCLASS

PROSITE

MEROPS

The Center for Molecular Modeling (CMM)

Protein Data Bank (PDB)

Data Aggregation in Bioinformatics:[19]

Researchers in the Bioinformatics are very excited by the promise of the semantic web. They want to integrate data from many different data sources, so that they can make well-informed decisions, yet data integration has been challenging. The difficulties stem from data being made available in different formats for example different tab-delimited files formats, different XML schemas, and in different relational models. The task is also made harder because the data models frequently change as science progresses, and individuals learn that additional data is also relevant. In addition there is acronym collision across the data sources, and data can be in different data types for example graphs, images, text, chemical structures, etc.,.

Many data integration projects currently fail. One of the most common reasons for the failure is the inability to extend the data model to incorporate new data, or the inability to re-use data in ways that it was not originally intended. RDF provides a very flexible model for adding new data to a data model and for re-using data in ways that it was not originally intended. Researchers need to be able to re-use data and re-aggregate applications. They also like the idea of the serendipitous discovery of new information.

Advantages of Semantic Web:

the ability to integrate heterogeneous data through common explicit semantics

the expression of rich and well-defined models of systems

the formal annotation of findings and interpretations

the ability to embed models and semantics directly within online publications

the application of logic to infer new insights

the ability to search based on term meaning, and it enables data to be machine processable.

Software’s to be used:

Bio DASH – for semantic data integration

Eclipse - Eclipse is an open source community whose projects are focused on building an open development platform comprised of extensible frameworks, tools and runtimes for building, deploying and managing software across the lifecycle

SWSF(Semantic Web Services Framework)- to specify formal

characterizations of Web service concepts and descriptions of individual services

TopBraidComposer - it is an enterprise-class modeling environment for developing Semantic Web ontologies and building semantic applications

Jena - is a open source Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine

Jdom - a Java representation of an XML document provides a way to represent that document for easy and efficient reading, manipulation, and writing

SWOOP - is a tool for creating, editing, and debugging OWL ontologies. It was produced by the MIND lab at University of Maryland, College Park, but is now an open source project with contributors from all over.

Altova Semantic Works - visual RDF and OWL editor for the Semantic Web (Commercial SW)

Protégé - is a free, open-source platform that provides a growing user community with a suite of tools to construct domain models and knowledge-based applications with ontologies. (supported by the National Library of Medicine)

Oracle Spatial 11g -includes an open, scalable, secure and reliable RDF management platform. Based on a graph data model, RDF triples are persisted, indexed and queried, similar to other object-relational data types. The system also implements subsets of OWL Full (Commercial SW)

Sparql - is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language

Haystack - is an extensible Semantic Web Browser developed by the Haystack research group at the MIT Computer Science and Artificial Intelligence Laboratory

IBM's Web Ontology Manager -is a lightweight, Web-based tool for managing ontologies expressed in Web Ontology Language (OWL).

Metatomix M3t4.Studio Semantic Toolkit - is a free set of Eclipse plug-in to allow developers to create and manage OWL ontologies and RDF documents

OntoStudio - is a professional developing environment for ontology-based solutions. It combines modeling tools for ontologies and rules with components for the integration of heterogeneous data sources. As ontology-languages Onto Studio supports W3C-standards OWL and RDF(S) and F-Logic for the logic-based processing of rules.

Semantic Web Search Engines:

Swoogle

HAKIA

Power set

SWSE

Semantic web search

Falcons

Sindice

Watson

Yahoo! MicroSearch

MultiCrawler

Uriqr

TomHeath

Zitgist Search

SHOE

Outcome:

Goal of the experiment is to create the feasibility of applying a Semantic Web aggregation model for a Bioinformatics problem using basic RDF-OWL tools and methodologies, on both the back-end (servers) and front-end (browsers), to extract meaningful knowledge.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now