Validation Of Predicted Therapeutic Targets

Published Date: 02 Nov 2017

1 Life and Environmental Sciences, Deakin University, Geelong, Victoria, Australia.

2 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States.

3 Victor Chang Cardiac Research Institute, 405 Liverpool St, Darlinghurst, 2010, NSW, Australia.

4 School of Medicine, Deakin University, Geelong, Victoria, Australia.

5 Australian Animal Health Laboratory, CSIRO Animal, Food and Health Sciences, Portarlington Road, Geelong, Victoria, Australia.

Email address:

MAW: [email protected]

MPG: [email protected]

KAM: [email protected]

TMC: [email protected]

CRS: [email protected]

Abstract

Background

Human genome sequencing has rapidly advanced our knowledge of disease genetics. Vast datasets linking phenotypes with genetic loci now exist, but our ability to effectively translate this data to the clinic has not kept pace. Meanwhile, pharmaceutical companies still conduct costly clinical studies to demonstrate safety and efficacy of novel therapeutic drugs. On the other hand, in-silico systems such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to the disease phenotype under investigation. Integration of drug-target data with the candidate gene prediction systems such as Gentrepid (www.gentrepid.org) can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on phase I clinical trials.

Results

We adopted a simple and comprehensive approach to integrate drug data with the candidate gene predictions at the systems level. We previously used Gentrepid as a platform to predict 1,805 candidate genes for the seven complex diseases namely Type 2 Diabetes (T2D), Bipolar Disorder (BD), Crohnâ€™s Disease (CD), Hypertension (HT), Type 1 Diabetes (T1D), Coronary Artery Disease (CAD) and Rheumatoid Arthritis (RA) considered in the WTCCC study [1]. Using publicly available drug databases, namely TTD, Pharm-GKB and Drug Bank as repositories for drug-target association, we identified a total ~22% of the predicted candidate genes as novel therapeutic targets, and 99% (2132 out of 2145) of identifying drugs feasible for repositioning against the predicted targets. We validated our predictions against therapeutic targets which had at least one Pubmed citation for a phenotype of interest and AUC values (at 95% confidence interval) suggested that our predictions were highly significant rather than occurring by chance. Hence, our holistic approach unravels the novel therapeutic targets and therapeutics, exploiting the available in-silico resources, accelerating the drug discovery pipeline.

Conclusions

We predicted 391 novel therapeutic targets for seven complex diseases by integrating genetic, bioinformatic and drug data. We have demonstrated that currently available drugs against novel therapeutic targets may be repositioned as novel therapeutics for the matching phenotype of the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate groundbreaking results in genetics to clinical treatments. The efficacy of these repositioned drugs can now be evaluated in phase II clinical trials.

Background

The development of new therapeutics for disease is essential to improve the human condition and lower the burden of disease. Due to our limited knowledge of the molecular basis of inherited diseases, comparatively few gene targets have been identified to date. The classical approach to develop therapeutics involves testing many thousands of compounds against a known target in order to identify a lead compound. The lead compound can then be further refined in silico and in vitro before heading into the lengthy and costly clinical trials pipeline. This process which consists of phase I, II, III and IV before final drug approval, involves 10-17 years of drug development from target identification until FDA/EMEA approval with 10% probability success rate [2]. As a result, pharmaceutical industries spend an average of about 1.2 billion dollars to bring a new drug into the market [3]. There is also a high risk associated with testing de novo drugs due to unforeseen adverse side effects, as seen in the case of Thalidomide, a drug used to treat morning sickness which resulted in devastating birth defects [4].

The identification of potential drug targets for human diseases is essential for the development of therapeutics which will enable further research to unravel the mechanisms underlying inherited diseases. A new approach to therapeutic development to identify new applications for already approved drugs, or drugs that have successfully completed phase I clinical trials. This process of "drug repositioning" aims not to develop new drugs but associate existing therapeutics with new phenotypes. Here we attempted to reposition existing drugs to treat common complex diseases using recently acquired Genome Wide Association Studies (GWAS) data.

Complex diseases are genetically intricate, polygenic and multifactorial. They frequently arise as a consequence of interaction between genes and the environment. Recently, GWAS have sought to tackle the previously intractable problem of the genetic basis of complex diseases. Sheer statistical power has enabled GWAS to successfully identify some associations between Single Nucleotide Polymorphisms (SNPs) and complex diseases [5]. However, the genotype-phenotype association signals are quite noisy. Also, analysis of GWAS data using highly stringent thresholds for statistical significance on isolated SNPs has limited the scope of gene discovery based on existing data.

Currently available gene discovery platforms to enhance the candidate gene identification and public drug databases as major drug repositories, can be utilized in the therapeutic drug-target discovery process. Specific tools differ in the strategy adopted for calculating similarity, and the databases utilized. These tools are basically designed to find the needle in the haystack. Gentrepid is one of many bioinformatic tools developed to help geneticists predict and priortize potential candidate genes [6-8]. These tools are based on the assumption that genes with similar or related functions cause similar phenotypes [9]. We have previously developed protocols to analyze GWAS data using a multilocus approach which combines bioinformatic and genetic data [10]. Using a series of increasingly less conservative statistical thresholds, we attempted to discriminate the signal from the noise in the most statistically significant data. We were able to predict 1,805 candidate genes by reanalyzing the well studied WTCCC data set on seven complex diseases (Type 2 Diabetes (T2D), Bipolar Disorder (BD), Crohnâ€™s Disease (CD), Hypertension (HT), Type 1 Diabetes (T1D), Coronary Artery Disease (CAD) and Rheumatoid Arthritis (RA)) compared to the originally identified 24 independent association signals [10]. The Bioinformatics tool and knowledge base used for the predictions is Gentrepid: a candidate gene prediction system.

The salient features of Gentrepid are:

It utilizes two independent methods- Common Pathway Scanning, a systems biology approach and Common Module Profiling, a domain-based homology recognition approach to prioritize candidate genes for human inherited disorders.

The Common Pathway Scanning (CPS) module, a system biology approach based on the assumption that common phenotypes are associated with those proteins that participate in the same complex or pathway [11]. System biology methods are currently favoured because of the attractiveness of their underlying philosophy. Their weakness is the lack of coverage of the underlying system biology knowledge bases. Many tools attempt to ameliorate the deficits of the knowledge base by extensive extrapolation of data from other species. Gentrepid CPS uses only human data to reduce the number of predicted false positives i.e. it makes a few predictions which are usually correct compared to other prediction systems [7].

Common Module Profiling (CMP) is a novel sequence analysis approach based on the principle that candidate genes have similar functions to disease genes already determined for the phenotype [12].

In this method, sequences are parsed at the domain level, linking them directly to function [13]. Although this method was disappointing in our original benchmark using a set of nine oligogenic diseases with Mendelian inheritance, it produced a surprising number of statistically significant results when confronted with GWAS data on seven complex diseases [10]. This result was robust when compared with simulations using random SNPs, and may arise from an underlying role for homologous genes in complex diseases.

Over the past few years, in spite of high investment in GWAS studies to find novel drug targets for common diseases, the application of the GWAS findings into the clinical field remains limited to date [14]. To address this deficit for seven complex diseases well studied in WTCCC GWAS study [1], we developed an approach to specifically identify potential drug targets for seven complex diseases. In this paper, we demonstrated identification of potential novel drug targets from a pool of predicted candidate genes by associating drug information extracted from publicly available drug databases. We have associated predicted candidate genes with detailed drug data compiled from recently available public databases, such as DrugBank (DB) [15], PharmGKB [16] and the Therapeutic Target Database (TTD) [17]. This study allows identification of possible therapeutics for treatments of complex diseases by enabling association of predicted candidate gene contribute to complex disease and providing possible drug compound information towards cures. Gentrepid, thus can be utilized as an initial drug screening tool to identify compounds of interest which may be used in drug repositioning in the initial phases of clinical trials.

Results and Discussion

We implemented a computational workflow to enable repositioning of drugs by integrating three datasets (Figure 1) :

Genotype-phenotype data from the WTCCC GWAS on seven complex phenotypes ref;

Bioinformatic data on structural domains and systems biology to associate proteins that share common features or participate in the same complex or pathway ref;

Drug-Target association data from three drug databases ref.

Comparison of drug databases:

The therapeutic drug-target association data was extracted from three databases :

Drug Bank, a cheminformatics/ bioinformatics database containing 3,411 human targets associated with 6,711 drugs ref;

Pharm-GKB, a pharmacogenomics database containing 382 drugs against 566 human targets ref;

TTD, a database that with comprehensive information about drug targets containing 2,960 drugs against 545 human targets ref.

We compared raw data such as drug targets, drugs from the three drug databases to determine the redundancy of the information in these databases. With respect to drug targets, only ~ 2.3 % drug target entries were common to all the three databases (Fig 2A). This low proportion of similarity among the databases suggests there exists quite a large number of targets available for repositioning. When the databases were compared in a pairwise fashion, the proportion of common targets ranged from 5-10%. Hence, the databases are fairly complementary and all contained a significant amount of information that is specific to that database. TTD has the least number of unique targets (129), while DB and Pharm-GKB include 2,884 and 326 respectively (Fig 2A).

Furthermore, we also compared the number of drugs from all the three drug databases as shown in Figure 2B. Drug Bank covered almost 67% of drugs used for our initial analysis while the other quarter was shared between TTD and Pharm-GKB with 30% and 4% respectively (Figure 2B). On coupling two databases as before to find the percentage similarity of drugs between databases, we observed TTD shares almost half of its total drugs with the Drug Bank (46%) identical to the results obtained for drug targets. Interestingly, there has been just one drug (Methyldopa) (out of 10,053) common among all the databases. This comprehensive comparison of statistics and coverage of the three drug databases enables to improvise on the choice and combination of databases to be considered for each analysis.

Identification of Therapeutic Targets :

We identified potential therapeutic targets from Gentrepid predicted candidate genes for seven complex diseases. In total, Gentrepid predicted 1,805 candidate genes for all the seven diseases. We mapped the drug-target files of all the three drug databases with these predicted candidate genes and identified overall of 413 potential therapeutic targets for the seven complex diseases (Fig 3A). This illustrates that almost 23% of the total number of predicted candidate genes by Gentrepid are potential targets for therapeutic treatments using currently available drugs. Individually, CAD and CD recorded the highest percent of predicted targets with almost 17% (Fig 3A). On the other hand, BD had the least percent of targets recorded with ~8% which is almost half the targets identified for CAD and CD. This might account for the noisy data obtained for BD in our previous study [10]. It was also observed from the Fig 3B that all the three databases were in par in their contribution to the target identification, as there was only a difference of 4-15% observed between the databases. Also it was expected that Drug Bank would provide more targets as it had the maximum coverage of human targets (Fig 2A) but surprisingly, TTD which had the lest coverage of human targets (Fig 2A) delivered the highest number of potential therapeutic targets in this study (Fig 3B).

Identification of Novel Therapeutic Targets :

We filtered the above pool of 413 targets to identify novel therapeutic targets i.e. targets which do not possess any registered therapeutics for our phenotype of interest (seven diseases considered in our study) until now. This resulted in 391 novel therapeutic targets accounting for almost 94% of the targets identified in the previous section. The remaining 22 targets contain either approved therapeutics, therapeutics in ongoing or discontinued clinical trials for our phenotype of interests (Table 3). The figure 3C shows the individual number of novel targets obtained for each of the seven diseases from all the three databases. The high percentage of novel targets suggests the scope of repositioning as there are plenty of disease modifying genes yet to be discovered which can serve as potential drug targets and possibly fills the gap currently present in our understanding of druggable genome [18]. It is estimated that ~22% of predicted candidate genes can be repositioned as novel therapeutic targets for the phenotypes studied in this work.

Identification of Novel Therapeutics :

Furthermore, we attempted to identify novel drugs for our phenotype of interest. So, we compared our phenotype of interest (from the pool of seven diseases considered in our study) with phenotypes indications associated with the drug. In total, we retrieved 10,053 drugs from all the databases and mapped with the pool of 413 therapeutic targets. This resulted in retrieving 2,145 (~21%) drugs that target the potential therapeutic targets. As shown in Figure 3B, over half the number of drugs were retrieved from Drug Bank (~65%) while the remaining were retrieved from TTD and Pharm-GKB â€“ ~34% and 10% respectively.

In order to identify the novel drugs i.e.drugs not targeting our phenotype of interest, we filtered the above list of 2,145 drugs to retrieve 2,132 novel therapeutics. The total percentage of drugs that may be repositioned towards identified novel targets was estimated to be ~21%.

We identified both matches and mismatches between the current drug indication and the phenotype of our interest. The mismatches serve as the novel therapeutics where as matches tend to relate to similar phenotypes. The table 2A shows the matches where the drug "Aleglitazar" in phase II clinical trial for Diabetes Mellitus, Type 2 targets upon our predicted candidate gene named PPARA against Type II diabetes (Additional file 1). It is seen that both the current phenotype associated with the drug and the phenotype of our interest are the same. Similar cases were observed with drugs like Rosiglitazone known to act upon target PPARG for diabetes mellitus, has a potential use in our phenotype of interest named Type I diabetes.

In case of mismatches (Table 2B) we found novel therapeutics for the phenotype. For example: Pirenzepine is approved as a therapeutic drug for peptic ulcer disease which acts upon the CHRM1 gene product (Additional file 1). CHRM1 is a predicted candidate gene for Type II diabetes, suggesting that the drug Pirenzepine may be repositioned as a therapeutic for Type II diabetes.

Hence, the associated therapeutics for these novel targets may be repositioned against a phenotype of interest. This freely accessible easy identification of a potential therapeutic target, can accelerate the drug discovery process.

Validation of predicted therapeutic targets:

We evaluated our predictions using targets identified for each of the seven diseases cited in the literature. The assessment was also based on ROC curves. The ROC curves for all the seven complex diseases were created by considering targets cited by at least one literature for the respective disease as true positives and targets without any citations as true negative.

Supplementary file 2 contains all the ROC curves with Area Under Curve (AUC) values. It was observed from the curves that the AUC values were greater than or equal to 0.7 for each disease. The least AUC value was obtained for Bipolar disorder with a value of 0.7 and the highest for Coronary artery disease with maximum value of 1.00. The least value can be attributed to the fact that BD had already less number of targets identified as observed in the previous sections. This suggests that our predictions of novel therapeutic targets for all the seven diseases are highly significant. The area under the curve was greater than or equal to 0.7 for seven diseases with 95% confidence interval (. 683, 704). Also, the AUC was significantly different from 0.5 since the p - value was .000 meaning that the logistic regression classifies the group significantly better than by chance.

FDA approved and Clinical trial targets:

We classified the targets as FDA approved and clinical trial targets for seven complex diseases. Figure 4 shows a comparison between targets present in the TTD database for Type 2 diabetes with FDA approved targets and clinical trial targets using predicted using Gentrepid. TTD database contains 32 targets for Type 2 diabetes, we identified 28 targets using Gentrepid for T2D but only three of these (HSD11B1, PPARA, NR3C1) are targeted by drugs currently in clinical trials for T2D. In addition, PPARA is already targeted by FDA approved drugs. Hence, we predicted 25 novel drug targets from the TTD database for Type II diabetes. Hence, these approved targets and clinical trial targets may be suitable for repositioning of novel drugs in the initial phases of clinical trials.

Significance of the Work

The primary purpose of our work was to identify potential therapeutics and therapeutic targets by integrating currently publicly available genetic and drug data. This method of repositioning will lead to immediate translational opportunities for drug discovery and development [14]. This can be achieved by designing bioinformatics tools that allow identification of potential therapeutic targets for complex diseases and other diseases. Although, there are tools available at present to serve the purpose, they are limited to certain phenotypes. For example, TARGET gene is a bioinformatics tool which identifies and prioritizes potential targets from hundreds of candidate genes for different types of cancer [19]. But, the identification is limited to different cancer types. Another study identified potential drug targets for three neurological disorders-Alzheimerâ€™s disease, Schizophrenia and Bipolar disorder. This study involved the prediction of candidate genes using ToppGene and ToppNet prediction systems. But, this study is also restricted just to three neurological disorders [20]. Our method is also restricted to the phenotypes considered in the WTCCC study. It is important to note that not all drug repositioning opportunities will be successful as there are always some limitations.

Conclusions

There is a need to develop new approaches for the identification of therapeutic targets to accelerate the process of therapeutic discovery. In this study, our approach integrates detailed drug data with predicted candidate genes for seven complex diseases. This study enables people to efficiently identify possible novel therapeutic targets and alternative indication of existing therapeutics. We found 22% of predicted candidate genes as novel therapeutic targets from the candidate gene dataset and ~21% of drugs as novel therapeutics from the drug dataset for the seven complex diseases considered in our study. We have utilized both FDA approved drugs and drugs in clinical trials. Further investigation to verify action of these drugs is required for the discovery of drugs against potential targets. Hence, these drugs may be repositioned against seven phenotypes of interests. Gentrepid thus can be utilized as an initial drug screening tool to save time and money spent on initial stages of drug discovery.

Methods

Materials and Method:

Candidate genes dataset: We used Gentrepid as a gene discovery bioinformatics platform and drug databases implemented online as web based tools repository of drug data. In previous work, we predicted a total of 1,805 candidate genes for seven complex diseases by careful reanalysis of the WTCCC GWAS data [1] using the Gentrepid candidate gene prediction system (Ballouz et al, 2011).In the original analysis, a highly stringent significance threshold (P < 5 x 10-7) was used in an attempt to correct for multiple testing. This conservative statistical approach, combined with the selection of the nearest-neighboring gene to the significant SNP, resulted in identification of only a small number of genes, with modest cumulative heritability, associated with each phenotype (Table no. 1A &B). Â We specifically addressed these two issues in our reanalysis of this noisy data by:

(a) Considering a series of four thresholds of decreasing stringency, starting with the highly significant threshold used in the original study and decreasing to (P< 10-3). This resulted in a series of four SNP sets containing up to 700 SNPs being considered for each phenotype;

(b) Creating six different search spaces around each SNP cluster, 3 of fixed-widths and 3 proximity-based, which were analyzed with our candidate gene prediction system. Twenty-four search spaces were constructed per phenotype using multiple SNP significance thresholds and gene selection methods. In total, 168 search spaces ranging in size from 2 to 4,431 genes (up to 10% of the genome) were analyzed using Gentrepid.

Â Drug-Target dataset: We utilized drug-target data from three publicly available drug databases: DrugBank [15], the Pharmacogenomics Knowledge Base- PharmGKB [16] and the Therapeutic Target Database (TTD) [17].

DrugBank is a freely available online database that combines detailed drug data with comprehensive drug-target and indication information. In this study, we used the DrugBank drug IDs, drug generic and brand names, to represent drugs and the unique gene symbols to represent protein targets. We extracted 6,711 drug entries active against the 3,411 unique drug targets.

Pharm GKB is another drug knowledge base that captures information about drugs, diseases/phenotypes and targeted genes. From this database, we extracted the "drug-associated genes" along with "description" which contains the disease information. We retrieved 382 drugs for 566 drug targets from the PharmGKB database because some drugs target multiple genes.

Therapeutic Target Database (TTD) is also a freely available online drug database which integrates drug data with therapeutic targets. This database contains 17,816 drugs against both human and non-human (bacterial and fungal protein targets). We extracted "Drug names" along with "Disease" information and Uniprot accession numbers for "targets". UniProt accession numbers were replaced with official HUGO gene symbols using the G-profiler conversion tool [21]. Finally, we extracted 2,960 drugs for 545 unique human drug targets described in this database.

Mapping of candidate gene dataset with drug target dataset:

We mapped the list of 1,805 candidate genes with drug target association files obtained from three drug databases. The drug-target association files had information including gene symbols of drug targets, drug information and disease associated with the drug. The candidate genes for each disease were mapped with three drug target association files separately from all the databases and results were retrieved .

Identification of novel therapeutics and therapeutic targets:

In the next step, we identified novel therapeutic targets and therapeutics for all seven diseases. If a drug targeted to a therapeutic target is not registered as a therapy for the phenotype of interest, it is predicted as a novel therapeutic target with associated drug information. The associated drug data are novel therapeutic suitable for repositioning.

Validation of predicted therapeutic targets using ROC curve:

The predicted therapeutic targets were further validated using ROC curve analysis. In this work, all Pubmed IDs of literature related to Bipolar disorder, Type 1 diabetes, Type 2 diabetes, Crohnâ€™s disease, Coronary artery disease, Rheumatoid arthritis, Hypertension were first downloaded from Pubmed on Feb. 2013. For each target, we calculated the number of citations related to each disease by mapping the extracted Pubmed IDs to the gene citation information from Entrez Gene (ftp://ftp.ncbi.nih.gov/gene/), composed of genes and their corresponding cited literature.

List of abbreviations

GWAS: Genome-wide association studies; WTCCC: Wellcome Trust Case-

Control Consortium; CPS: common pathway scanning; CMP: common

module profiling; BD: Bipolar disorder; CAD: Coronary artery disease; CD:

Crohnâ€™s disease; HT: Hypertension; RA: Rheumatoid arthritis; T1D: Type I

diabetes; T2D: Type II diabetes; NN: Nearest neighbour approach; BY:

Bystander approach; WS: Weakly significant set; MWS: Moderately-weak

significant set; MHS: moderately-high significant set; HS: highly significant

set, TTD: Therapeutic Target Database; PharmGKB: Pharmacogenomics KnwoledgeBase; DB: DrugBank.

Authors' contributions

MPG carried out the data mining and analysis, and worked on the design of

the project. MAW conceived the study, participated in its design and reviewed the results from the data analysis. MPG, MAW, TMC, KAM and CRS helped to draft the

manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by the Australian National Health

and Medical Research Council [grant number 635512 to M.A.W]

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Validation Of Predicted Therapeutic Targets

Abstract

Background

Results

Conclusions

Background

Results and Discussion

Results and Discussion

Comparison of drug databases:

Identification of Therapeutic Targets :

Identification of Novel Therapeutic Targets :

Identification of Novel Therapeutics :

Validation of predicted therapeutic targets:

FDA approved and Clinical trial targets:

Significance of the Work

Conclusions

Methods

Materials and Method:

Mapping of candidate gene dataset with drug target dataset:

Identification of novel therapeutics and therapeutic targets:

Validation of predicted therapeutic targets using ROC curve:

List of abbreviations

Authors' contributions

Acknowledgements

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time