The History About The Disease Network

Published Date: 02 Nov 2017

Disease Network

Abstractâ€” Ontologies are utilized to demonstrate reasoning of entities and objects in a specific domain along with relationships in a data model. They are used to construct and visualize domainsâ€™ and reinforce reasoning about the entities. The disease ontology was a comprehensive knowledge base of numerous inherited, developmental and acquired human diseases [1]. The Disease ontology was relevant due to its relevance and usefulness to medical experts as well as casual users. Numerous disease ontologies already exist, but the disease ontology that was developed was an extension of previous ontologies.

Introduction

Dating as far back as the 23rd century BC where the Eshuma Code of Babylon originated to the modern-day 21st century AD published medical results; experts have recorded deviations in medicine to decipher the disease enigma. Ranging from diagnostic assessment, analysis and data correlation over thousands of millennia and amid investigations can be vastly simplified through semantically accurate representations such as those accessible via the ontology. Medical and research associations have actualized numerous terminologies to consistently archive mortality derivatives to homogenize healthcare recording and cataloguing [1]. Numerous ontology existed that included diseases and classificationsâ€™, but none existed that enabled symptom mining to build a disease ontology. The ontology that was developed was not only a comprehensive database of sorts based on the ICD-10 but also contained symptom mining and source identification. Motivation for the project was derived from that fact that the ontology was an extension of current ontologies as it enabled symptom mining based on diseases, relationships in-between diseases, detailed information of diseases all contained in a single ontology. This extension of building ontology to contain symptom mining leveraged its significance. Due to its comprehensive nature along with symptom mining the ontology was useful to a vast audience. Within the scope of this study the disease ontology modeling techniques, formal ontology languages, ontology visualization and creation software, and limitations and concerns of the current project were investigated.

Background Research

In order to build disease ontology, a valid and reputable source was used to populate the database with accurate diseases and their symptoms. For this purpose ICD-10 (International Statistical Classification of Diseases-10) was considered. ICD-10 was the 10th revision of the ICDÂ (International Statistical Classification of Diseases and Related Health Problems); it was aÂ medical classificationÂ archive developed by the WHOÂ (World Health Organization). It coded for diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases [2]. Wikipediaâ€™s entry of ICD-10 had twenty two chapters; each chapter was a classification of diseases. Each chapter had blocks which listed diseases, the number of diseases in each block varied from a few to a couple of hundred. Another source called ICD10Data.com had a similar database containing the ICD-10 codes, but it did not contain any additional information about the diseases. On the contrary Wikipedia seemed more beneficial to this project as its database is vast. Each disease in the ICD-10 page on Wikipedia had its own webpage with a description, diagnosis, prevention, treatment, etc. Even though extracting the data was time consuming it was beneficial to use one source that is complete and credible. A more recent version of ICD by WHO is ICD-11. ICD-11 is still in its beta version and is updated on a daily basis and isnâ€™t scheduled to be released until 2015. Hence, it was futile to use ICD-11 for the disease ontology. Another source that was considered was an e-book called An Index of Diseases and Their Treatment by Thomas Hawkes Tanner [3]. The e-book itself was slightly out-dated; unfortunately no newer editions of the book exist. The e-book is approximately five hundred pages and is filled with diseases in an alphabetical order with disease descriptions followed by their symptoms and treatment. Even though the book was very detailed, its out-dated edition made it a fairly weak source to build ontology on. However it was useful in getting the proof of concepts in place.

To construct the ontology numerous existing formal languages were considered. Ontology languages were non-procedural (very high level) languages, derived from meta-languages and hinged on either first-order predicate calculus or on description logic [4]. Widely used ontology languages include but were not limited to Ontolingua, OKBC (Open Knowledge Base Connectivity), OCML (Options Configuration Modeling Language), FLogic (Frame Logic) and LOOM, other languages created in the context of Internet, which were recommendations of the W3C (World Wide Web Consortium) include XML (Extensible Markup Language), RDF (Resource Description Framework) and (RDFS Resource Description Framework Schema); and, finally, other new languages for the specification of ontologies include XOL (Ontology Exchange Language), SHOE (Simple HTML Ontology Extensions) and OIL (Ontology Interchange Language) [4].

The disease ontology being developed for this project was written in OWL (Web Ontology Language) endorsed by the W3C. To model our data we looked at two specific languages: RDFS and OWL. RDFS and OWL were both data modeling languages, but OWL was better suited for the Semantic Web world. RDFS specifically allowed the user to express correlations in-between entities by the format and provided unique words such as â€˜rdf: typeâ€™ or â€˜rdf: subClassOf.â€™ OWL was similar but better and more extensive. OWL allowed the user to say much more about the data model. It guided the developer on running successfully with database queries and semantic reasonersâ€™ and provided beneficial annotations for importing data models. While both languages provided vocabularies, OWL provided a far larger vocabulary that permitted describing data as a set of operators. In addition to this OWL permitted defining equivalents across databases and allows restriction of strict property values. Another factor that made OWL more beneficial to this project was its rigidity. OWL not only specified how certain vocabulary could be used, it actually specified how it cannot be used. By contrast RDFS allowed everything. Another advantage to using OWL was its restriction on to how much computing calculation it took to implement this kind of interface with all the vocabulary terms. Simple interferences could be run quickly, others took a very long time and there were some kinds of interferences which were not solvable by any computers. OWL solved this problem by letting the developer decide how representative he/she wanted to be, only if the computational realities were satisfied. In conclusion OWL provided a larger vocabulary, which made it easier to describe the data model. It allowed the user to tailor interferences how the user wants based on computational realities and further optimizes them for particular applications (e.g. queries) [5].

Numerous tools existed to model ontologies. DOGMA, DogmaModeler, Kaon, OntoClean, OnToContent, HOZO and ProtÃ©gÃ© were the frequently used programs. To build the ontology, ProtÃ©gÃ© was used. ProtÃ©gÃ© was a free-open-source platform that provided easy to use tools to construct the ontology. It implemented an abundant set of knowledge-modelling framework and manipulations that supported the development and visualization of ontologies in numerous designs [6]. Numerous ontology format file extensions existed while creating ontology in ProtÃ©gÃ©. After much research the OWL/XML file format was used. In ProtÃ©gÃ© the OntoGraf module provided the visualization of the constructed ontology to aid novice and expert users in exploring the disease ontology. In order to populate the ontology, one created entities and added keys and relations to validate the domain relations. Other attributes included object properties, data properties, comments, querying and the OntoGraf ontology visualizer. Last but not least the simple user interface made it very lucrative for a first time user. Time was a major concern here, and anything that aided in simplifying the project workload was essential.

Building a Disease Ontology

ProtÃ©gÃ© was used in order to build the ontology for source identification based on symptom mining. The ontology file was constructed using XML and OWL2 language format. After much research, it was concluded that no such ontology existed similar to what was being developed. All ontologies were simple, and included a list of diseases; there was no classification or relationships of any type. A simple way to represent the ontology was where symptoms were listed and diseases were classified according to symptoms as shown in Figure 1.

Figure 1: Simple Ontological Representation of Diseases

Figure 1 shows a simple way to approach ontology visualization where symptoms are listed and diseases are classified according to symptoms. A more detailed approach to represent the ontology was the following way where diseases are classified in a category and symptoms as shown in Figure 2.

Figure 2: Detailed Ontological Representation of Diseases Along With Symptoms

Figure 2 shows a more detailed approach to represent the ontology, where diseases are classified in to categories and symptoms. Figure 2 shows an ideal disease ontology that was useful to many users, medical personnel as well as casual users. It shows the diseases, classifies them and allows the user to build the ontology based on symptom mining.

Figure 3: High Level Design of Disease Ontology [7]

Figure 3 shows the high level design of the ontology being developed from a developerâ€™s point of view. The ontology database was built in ProtÃ©gÃ©. The knowledge modeller processed it; the output was modeled in OWL and XML. This combined file was then processed again and displayed in the graphical user interface via the 3D rendering engine (OnToGraf). The end user viewed and used the ontology in its 3D form.

Figure 4: Low Level Design of Disease Ontology

Figure 4 shows the low level design (class diagram) of the ontology being developed from a developerâ€™s point of view. Ontology had many diseases and diseases belonged to one and only one ontology. Diseases were classified in to two categories, symptoms and the ICD-10 sub classification. Diseases and Ontology shared a whole-part relationship resulting in a composition whereas classification and Ontology shared a share ownership resulting in aggregation. Symptoms and ICD-10 sub-classification shared an inheritance or generalization "is a" relationship with classification.

Limitations and concerns regarding the ontology related primarily to the size of the project. Building the completely functional disease ontology with source identification based on symptom mining in the allocated time proved difficult but not impossible.

Disease Ontology Analysis

Several relationships emerged while developing the Disease Ontology. The diseases were classified according to ICD-10 classification of diseases. Due to this classification, all the classified categories of diseases had one major thing in common; they were part of the super class classification. Each disease in the developed ontology had symptoms listed as members of a particular disease. In total there were approximately seven hundred total symptoms classified as members. Several of these members were present in multiple diseases throughout the ontology. This multiple member occurrence enabled the symptom mining aspect of the disease ontology. This enabled users to simply list what symptoms they want and display the diseases that are part of that symptom. The Disease Ontology developed had another classification based on symptoms. The symptom classification was further classified in subclasses based on general symptoms such as aches, pains, blood related symptoms, etc. This sub classification of symptoms caused another relationship to emerge. Symptoms were also classified alphabetically for archival purposes, ease searching and sorting during visualization.

The ontology contained one superclass named Thing; it was a default class ProtÃ©gÃ© generated automatically. The superclass Thing contained two subclasses, Classification and Symptoms. Classification contained thirteen subclasses. Each one of these subclasses under classification contained additional subclasses ranging from four to upwards of fifteen. These subclasses were the parent blocks as described in the ICD-10; the parent block subclasses contained sibling classes ranging from three to upwards of fifty. Each of the sibling class was a disease that populates the ontology database.

The subclass Symptoms contained the symptoms used in the ontology as members for the sibling classes. The Symptoms classification contained seventeen sibling classes. Sixteen of these subclasses were generalizations of symptoms, with names such as aches, pains, infections, swellings, etc. These subclasses contained members which were more specific symptoms used in the ontology. The seventeenth subclass was named Alphabetical, which listed all the seven hundred symptoms alphabetically.

Each disease that was in the ontology had 4 descriptive sections. Two of these sections stated the ICD-10 code related to the disease. The third section was named â€˜Descriptionâ€™ and contained the diseaseâ€™ name and the fourth section named â€˜Informationâ€™ had information related to the disease. This information was only available for certain diseases in the ontology and was directly extracted from WebMD.

The OWL file itself contained 131,836 lines of code and was 5.21 mb in size. All information regarding classes, members, unions, disjoints, etc. was declared in-between â€˜<Ontology> â€¦ </Ontology> tags.

Classes were declared in-between:

</Declaration>

Subclasses were declared in-between:

</SubClassOf>

Disjoint classes were declared in-between:

</DisjointClasses>

Symptoms keywords that were classified as members of the class were declared in-between:

</ClassAssertion>

ICD-10 codes for each disease were listed under:

<IRI>#CLASS</IRI>

</AnnotationAssertion>

Ontology Tool Analysis

In order to build the ontology in the strict time frame provided, the developed had to choose the development tool wisely. The tool chosen would have had to be user friendly, quick to implement with minimal knowledge of ontology development or the software itself and provide vast customization options. ProtÃ©gÃ© installation was very minimalistic and easy. The add-ons supported by ProtÃ©gÃ© including OntoGraf had to be installed separately, yet the installation process for them was very easy as well. ProtÃ©gÃ© version 4.1 was used to construct the disease ontology. A major drawback of the software was that version 4.1 only supported approximately ten thousand frames without causing any major interface delays. The disease ontology developed contained upwards of thirty thousand frames. These delays ranged from 10 to 30 seconds depending on how extensive the ontology was. Due to this restraint, the development phase for the ontology was slightly sluggish and frustrating as the developer had a strict time frame to adhere to. This interface delay was very miniscule compared to the advantagesâ€™ gained from using ProtÃ©gÃ©. For a novice user that had minimal knowledge of what an ontology was, let alone an ontology editing software such as ProtÃ©gÃ©; the softwareâ€™s ease of use and excellent graphical user interface overshadowed any problems and glitches that occurred during the development phase.

ProtÃ©gÃ© had several â€˜tabsâ€™ that were used to customize and edit the ontology. Tabs that were used to edit the ontology included the Classes tab, Annotations tab, Descriptions tab, Data Properties tab, Annotation Properties tab, Individuals tab, OWL Viz tab and OntoGraf tab. Classes tab was used to add sub and sibling classes, along with annotations, members and unions and disjoints from other classes. Data Properties and Annotation Properties tab were used to edit the data types listed under members and annotations (e.g. has_Description, CodeName_pl, comment, etc.). The Individuals tab was used to analyse each class in detail. Owl Viz and OntoGraf were used to visualize the ontology. OntoGraf contained numerous ways of viewing the ontology, of which the radial view was the most optimal setting for the disease ontology. Other tools that OntoGraf had include zooming, node and arc types checking, screenshot, node tip configuration and export options to name a few. In conclusion, ProtÃ©gÃ© served as excellent tool to develop the ontology in a strict time frame with numerous customizations and it enabled the symptom mining aspects and requirements for the disease ontology.

Conclusion

The Disease Ontology developed served as an excellent tool to expand my knowledge on the ontology topic. Since the beginning it was evident that developing the disease ontology to enable source identification based on symptom mining would be a monumental task. The development task began after weeks of research on how existing ontology were developed, what sources were needed (e.g. software) and credible sources that housed extensive information on diseases. Gathering all the information to populate the database was easy yet very time-consuming. Designing the relations in between classes, subclasses, etc. proved difficult, as this step was very crucial to get the final developed ontology correct. After populating the database with credible information, creating relationships in-between diseases through symptoms; the developed Disease Ontology is able to enable users to enable symptom mining to build disease ontology.

Appendix

[8]

Acknowledgment

Hardik Patel wishes to acknowledge Dr. Coskun Bayrak for guiding him through the ontology development and providing excellent feedback for improvements to documentation and the ontology being developed.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now