Modular Organization Of The Human Disease Genes

Published Date: 02 Nov 2017

YUE Yi1, WU YunZhi1, ZUO YongChun2, LI YuanJing3, FAN GuoHua1 & ZHANG ShiHua3*

1College of information and computer science, Anhui Agricultural University, Hefei 230036, China

2The National Research Cen [1] ter for Animal Transgenic Biotechnology, Inner Mongolia University, Hohhot, 010021, China

3College of Life Science, Anhui Agricultural University, Hefei 230036, China

Abtract. Evidence from recent analysis of disease textual data indicated that genes associated with clinically related diseases tend to have similar functionality. Thus, disease textual data can mirror the modular organization implied in the association map of disease genes. In this study, we construct a human disease gene network with their phenotypic relatedness. Our results revealed a modular organization of these disease genes, and the modularity is well correlated with the physiological classification of genetic diseases. In this network, we detected 139 gene modules and found that protein products in these gene modules tend to interact with each other. Genes in such gene modules are functionally related and may represent the shared genetic basis of their corresponding genetic diseases. Thus, these genes, alone or in combination, could be considered as potential therapeutic targets in future clinical therapy.

Introduction

When used together with genetic information, disease textual data can help to explore relationships between genetic diseases and mutation-bearing genes [1-4]. However, disease textual data such as the Online Mendelian Inheretance in Man (OMIM) [5] and the PhenomicDB database [6-7] remain intractable to be deal with because the lack of a standardized representation for the disease description. Despite these obstacles, there exists some successful groundwork in utilizing such daunting textual data. For example, Freudenberg et al. [8] manually extracted and clustered nearly 1,000 diseases from the OMIM database according to their phenotypic relatedness using periodicity, etiology, tissue, age of onset and mode of inheritance as classification indices. Their results showed that genes causing related diseases have similar Gene Ontology (GO) annotation. Groth et al. [3] used textual clustering to group disease genes based on their disease textual data from PhenomicDB database, these disease clusters were shown to be correlate with several indicators for biological coherence in gene groups, such as GO annotation and protein-protein interaction. We have previously reported a text-based phenotype network modeling to detect disease-specific gene module, and the further analysis indicated that these gene modules corresponding to specific disease clusters behave strong functional correlation at the levels of GO annotation, disease type enrichment, and protein-protein interaction [9]. Together, these attempts revealed a fact that genes leading to clinically related diseases tend to have the similar functionality. These related genes work together, as a modular architecture, such as protein complex and cell pathway, to perform a specific biological function [10-13]. The functional relationship in these related genes are in agreement with the modular property of most biological networks, indicating the existence of densely-connected subgraphs in disease gene functional network.

In this study, we constructed such a functional network of human disease genes, and further investigated its modular architecture. We determined the association between disease genes in the network by using their phenotypic relatedness. The disease gene network has been proved to have a high modular property by two modular measures originally proposed by Park et al. [14]. From the network, we extracted 139 gene modules and found the modularity correlates with the functional level of PPI. Of these 139 gene modules, 127 (91.4%) were significantly enriched in only one disease type or two. Thus, our network-based framework revealed that disease genes and the associated genetic diseases have a high level of agreement in functional interplay, although they are at two different biological levels.

1. Materials and methods

1.1 Construction of the human disease gene network

The OMIM database, a comprehensive human disease textual data resource, provides detailed descriptions of different genetic diseases engendered by mutant individual genes. In our strategy, the two sections of text and clinical synopsis in a disease textual record were regarded as a single one. We applied the Transfer program [15] to map disease textual records on the Unified Medical Language System terminologies [16-17]. These records were thus converted to corresponding textual vectors and refined with the term frequencyâ€“inverse document frequency weighting method [18] to calculate the phenotypic relatedness between different disease textual records. Inspired by the fact that genes associated with similar diseases are more likely to have similar functionality, we used the phenotypic relatedness to decide the functional association of disease genes in the human disease gene network by textual vector random permutation. In the locus heterogeneity situation that a disease phenotype arises from mutations in different disease genes, we would place a link between any two of such disease genes because these genes are absolutely-associated in their phenotypic relatedness.

1.2 Modular evaluation of the human disease gene network

We used the two modular indexes, dyadicity and heterophilicity [14], to evaluate the modular organization of the disease gene network. Dyadicity (D) represents the enrichment degree of edges between nodes with the same attribution over the expected number from random-distributed network of this attribution. Heterophilicity (H) represents the tendency of nodes to connect nodes of other attributions. In the disease gene network, disease genes with their associated diseases belonging to the same disease types were regarded to have the same attribution. In such case, we can compute the Ds and Hs for different disease types.

The result that D>1 (dyadic) or D<1 (antidyadic) indicates that disease genes with their associated diseases belong to a given disease type (have the same attribution) connect more or less among themselves than expected from random configuration, and H>1(heterophilic) or H<1 (heterophobic) indicates that disease genes with their associated diseases belong to a given disease type connect more or less to disease genes with their associated diseases do not belong to this disease type than random expectation. In the case of D>1 and H<1, we can conclude that the human disease gene network has a highly modular organization because disease genes with their associated diseases belong to a given disease type have a clear clustering tendency in the network.

1.3 Mining gene modules and evaluation with protein interaction intensity

We used a graph theoretic clustering algorithm [19] to extract gene modules from the human disease gene network. To evaluate the functional relations of genes in extracted gene modules, the protein-protein interaction strength Sppi was proposed to be the proportion of the actual protein interactions versus the probable maximum ones in a gene module i, as follows:

Where Nreal denotes the real protein product interaction pairs the gene module, k denotes the number of protein products the gene module which have interaction partner from Human Protein Reference Database [20].

1.5 Disease class enrichment analysis

Disease class enrichment analysis was implemented to investigate whether genes in a given gene module tend to have associated disease phenotypes belonging to the same disease class, to explore the enrichment of gene module in a certain disease class. The framework was executed like this: i) for a given gene module, we randomly picked from all the disease genes and built 10,000 pseudo gene modules that have the same number of disease genes as the real gene module , ii) in the real gene module, the possible disease classes of the associated disease phenotypes are determined and the number of disease phenotypes belonging to each certain disease class is counted and iii) finally, the P-values for every possible disease class determined in the real gene module are computed based on the random controls.

2. Results

2.1 Modular organization of the human disease genes

We collected all the known 1,865 disease genes from the OMIM database and constructed the human disease gene network which possesses 21,514 links among 1,685 disease genes, with a giant component of 1,607 (99.36%) disease genes and 21,428 (98.9%) links (Figure 1). In the giant network, disease gene nodes were indicated as different colors according to disease classes of their associated diseases. Here, we referred to the disease classification, described by Gol et al. [21], who manually classified diseases into 22 main disease classes according to the physiological system affected. It is visually indicative that disease genes with their associated disease phenotypes belonging to the same disease class tend to group together forming different modular organization in the network.

Table 1 listed the two modular indexes, dyadicity D and heterophilicity H, for the 22 disease classes. The fact that the human disease gene network has a highly modular organization can be proved by the observation that all the disease types were dyadic (D>1) and the majority (77.27%) heterophobic (H<1), together indicating a high correlation between the modularity and the physiological classification of genetic diseases. Nevertheless, some disease types, e.g., developmental, connective tissue, skeletal and dermatological were heterophilic, indicating that these disease types have similar phenotypic features with other types of diseases. It is reasonable for developmental diseases because they tend to cause pathological changes in multiple tissues, and thereby have similar phenotypic features with different categories of diseases. Regarding connective tissue, skeletal and dermatological diseases, we surmise that they may affect other tissues during the disease develepment, and therefore overlap with other types of diseases. For unclassified diseases (H>1, heterophilic), there is not a logical explanation due to their uncertainty in disease classification.

2.2 Gene products in a gene module tend to interact with each other

Of the total 139 gene modules extracted from the human disease gene network, 14 gene modules have the Ippi=0 because none of their members has interactions with others in the HPRD. Of the remainder, 9 gene modules have the Ippi=1 and the others have the Ippi of 0-1. Finally, we got the mean Ippi is 0.34. To test the statistical significance of the obtained mean Ippi, 139 gene sets of the same size as the corresponding gene modules were chosen from all the disease genes as a random control. We built 10,000 such random controls and the result showed that the mean Ippi is significantly higher than that of random groups (P-value=2.5e-3), indicating that gene products in a gene module have a tendency to interact with each other and be part of the same biological process; that is, these gene products may serve together, as a fundamental functional unit of biological system, to participate in the same cellular pathway or molecular complex.

Figure 2 illustrated the PPI subgraph of gene module 74 in the HPRD background. The subgraph included 10 disease genes (blue nodes with gene names) for which associated genes (gray nodes without names) are connected. This graphical representation indicated that these disease genes tend to functionally interact at the PPI level. In addition, GO term analysis showed that these genes were enriched in GO category: myeloid cell differentiation (GO:0030099, P-value=3.2e-3), suggesting that the gene module behaves as a functional entity associated with "myeloid cell differentiation". At the phenotype level, it is interesting that the associated disease phenotypes all belong to hematological diseases which, possibly, arise due to the disruption of myeloid cell differentiation.

2.3 Gene modules tend to enrich in certain disease classes

We referred to the disease class annotations described by Gol et al. [21] to conduct disease class enrichment analysis. The result showed that 113 (81.3%) gene modules were significantly enriched in only one of the 22 disease classes, 14 (10.1%) gene modules in two disease classes and 12 (8.6%) gene modules in three or more disease classes. Our statistical results also indicated that the associated disease phenotypes of a given gene module tend to belong to the same disease class (Figure 3). Taking together, the vast majority (91.4%) of gene modules have significant specificity to certain disease classes, indicating that these gene modules represent shared genetic origin of the associated diseases, and that genes in a given gene module may be used as a proxy of related diseases in future clinical therapy.

3 Discussion

To explore all known phenotype and disease gene associations, our colleagues [21] constructed a bipartite graph of all known genetic diseases and disease genes. In the disease gene network projection, two genes are connected if they are involved in the same disease. In the present study, we constructed the human disease gene network by using the phenotypic relatedness of their corresponding genetic diseases. Disease textual data provides a valuable window for dissecting genotypeâ€“phenotype associations. Textual relatedness should be a potentially suitable measure for deciding disease gene interactions in the human disease gene network. In addition, the human disease gene network provides a disease-gene-centered sight of disease association map. From this view, we can explore the molecular mechanisms underlying most genetic diseases. For example, genes in those gene modules extracted from the network have been shown to functionally interact and the associated diseases are clinically similar at the phenotype level. This suggested that related genes may cooperate to perform desired cellular functions contributing to certain diseases. This, if experiment confirmed, may be used as a clue to assist clinical physicians in future drug-target screening.

Our proposed network simulation method revealed an obvious modular organization in the network of human disease genes. In the network, the edge between two disease genes represents a measure of their phenotypic relatedness; thus the modular organization supports the existing modular property in human genetic diseases [22-23]. Our findings also showed that disease genes and their associated genetic diseases have a high level of agreement in functional interplay. We believe such functional agreement will prompt the integrative analysis of different levels of biological source data. For example, the phenotypic relatedness measure of two genes in the human disease gene network can be considered, combined with gene expression, protein interactions and GO annotation, to predict candidate genes in an integrated network way.

The measure of gene interactions in the human disease gene network is less quantitative due to the daunting nature of human disease textual data. In this situation, a weighted human disease gene network should be considered so that the network is more informative and fit for graph-based clustering algorithm. With the comprehensiveness of disease textual data comes true, we can construct a more complete map of human disease genes, which make it feasible to investigate the associations among genome, interactome, phenome and other level of omics. We believe these attempts can inform our understanding of the relationship between human diseases and the underlying genetic mechanisms, and further help to uncover pathophysiologic foundations of most genetic diseases.

Referrences

1. Freimer, N. and C. Sabatti, The human phenome project. Nat Genet, 2003. 34(1): p. 15-21.

2. Lussier, Y., et al., PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing. Pac Symp Biocomput, 2006: p. 64-75.

3. Groth, P., et al., Mining phenotypes for gene function prediction. BMC Bioinformatics, 2008. 9: p. 136.

4. Loscalzo, J., I. Kohane, and A.L. Barabasi, Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol Syst Biol, 2007. 3: p. 124.

5. Hamosh, A., et al., Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res, 2005. 33(Database issue): p. D514-7.

6. Groth, P., et al., PhenomicDB: a new cross-species genotype/phenotype resource. Nucleic Acids Res, 2007. 35(Database issue): p. D696-9.

7. Kahraman, A., et al., PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics, 2005. 21(3): p. 418-20.

8. Freudenberg, J. and P. Propping, A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics, 2002. 18 Suppl 2: p. S110-5.

9. Zhang, S.H., et al., From phenotype to gene: detecting disease-specific gene functional modules via a text-based human disease phenotype network construction. FEBS Lett, 2010. 584(16): p. 3635-43.

10. Badano, J.L. and N. Katsanis, Beyond Mendel: an evolving view of human genetic disease transmission. Nat Rev Genet, 2002. 3(10): p. 779-89.

11. Brunner, H.G. and M.A. van Driel, From syndrome families to functional genomics. Nat Rev Genet, 2004. 5(7): p. 545-51.

12. Gandhi, T.K., et al., Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet, 2006. 38(3): p. 285-93.

13. Kann, M.G., Protein interactions and disease: computational approaches to uncover the etiology of diseases. Brief Bioinform, 2007. 8(5): p. 333-46.

14. Park, J. and A.L. Barabasi, Distribution of node characteristics in complex networks. Proc Natl Acad Sci U S A, 2007. 104(46): p. 17916-20.

15. Aronson, A.R., Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp, 2001: p. 17-21.

16. Bodenreider, O., The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res, 2004. 32(Database issue): p. D267-70.

17. Gu, H.H., et al., Evaluation of a UMLS Auditing Process of Semantic Type Assignments. AMIA Annu Symp Proc, 2007: p. 294-8.

18. Wilbur, W.J. and Y. Yang, An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Comput Biol Med, 1996. 26(3): p. 209-22.

19. Bader, G.D. and C.W. Hogue, An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003. 4: p. 2.

20. Peri, S., et al., Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res, 2003. 13(10): p. 2363-71.

21. Goh, K.I., et al., The human disease network. Proc Natl Acad Sci U S A, 2007. 104(21): p. 8685-90.

22. Oti, M. and H.G. Brunner, The modular nature of genetic diseases. Clin Genet, 2007. 71(1): p. 1-11.

23. Oti, M., M.A. Huynen, and H.G. Brunner, Phenome connections. Trends Genet, 2008. 24(3): p. 103-6.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now