The main challenge for in silico genotype-phenotype correlation for any genetic disease is to standardize phenotype ontology terms. Towards this end, we are currently in the process of standardizing and classifying all reported disease, gene and mutation specific PID phenotypes from the published literature and Immunodeficiency resource (IDR). The generated phenotype list is then mapped to the standard phenotype ontologies obtained from Human Phenotype Ontology (HPO), human diseases (DOID) and Symptom ontology browser (SYMP) as provided by the NCBO BioPortal and EBI ontology lookup service.
Our main objective is to present a heterogeneous primary immunodeficiency disease (PID) phenotypic terms into systematic ontology structures that integrate genes, PIDs and mutation data in a semantically well-defined ontology and standardized formats such as Web Ontology Language (OWL) and Resource Description Framework (RDF) using semantic web technology in order to share and exchange information freely among other users' communities as well as it can be further integrated as KnowledgeBase query interface - SPARQL for establishing well-informed clinical decision support system. To our knowledge, PID PhenomeR, is the first initiative of this kind, to integrate and interpret PID data into a web-based user-friendly interface towards a community-driven semantic web technology. A screenshot of PID PhenomeR home page is shown in Figure 1.http://rapid.rcai.riken.jp/ontology/v1.0/phenomer.php
The overall workflow of PID PhenomeR includes collection, mapping, integration, standardization, quality-control measures, generation of ontology files, uploading and development of unified search options is shown in Figure 2. This kind of analysis should bridge a gap between genotype and phenotype correlation thereby improving phenotype-based genetic analysis of PID genes. Moreover, it should facilitate clinicians in confirming early PID diagnosis and also rendered support in implementing proper therapeutic interventions leading to improved survival and health-based quality of life in PID patients.
The semantic web technology, machine-interpretable descriptions, has been implemented in RAPID for all annotated entries using Resource Description Framework (RDF) and Web Ontology Language (OWL) file formats. The salient feature of this technology is being exploited in automation of web-based information and also facilitates exchange and sharing of PID data among other interested research groups from all over the world.
Further, RAPID Phenotype Ontology (RPO), the mapped PID phenotype terms of systematic ontology structures being described in a standardized format - Web ontology language (OWL) as well as PID PhenomeR have been uploaded on the NCBO Bioportal as a new ontology and project initiated at http://bioportal.bioontology.org/ontologies/3114/ and http://bioportal.bioontology.org/projects/171 respectively.
Our first focus will be on the analysis of primary immunodeficiency diseases because RCAI has been working intensively on PID and accumulating many lines of information in collaboration with outside researchers. This database will serve as a prototype for other immunodeficiency and immunologic disease databases, and will combine the efforts of scientists from molecular biology, immunology, genomics, proteomics and bioinformatics from other countries especially from Asian regions.
Although genotype-phenotype correlation has been established for a few PIDs, it remains to be highly challenging tasks to clinicians for early diagnosis of PIDs. There are many pathway resources available for depicting cellular signaling cascades and regulatory networks. However, there is a persistent need to construct PID specific pathways to understand and analyze the disease pathogenesis. In this milieu, we propose to develop a web-based integrated PID pathway framework to collect molecular events such as protein-protein and protein-DNA interaction, enzyme catalysis, transcriptionally regulated genes and cellular processes involved in PID pathogenesis from available literature-based information. Based on collected PID pathways data, 3D pathway diagram and gene regulatory networks can be depicted using visualization software tool, Protein Lounge, e-Path and simulation and modeling software tool, Cell Designer respectively. For analysis of high-throughput data such as DNA micro-array expression data in the context of PID pathways, we will be using PathVisio tool. This will provide a global picture of normal and disease subjects. Successful outcome of this kind of initiative would explore dynamics of signaling and regulatory pathways involved in PID pathogenesis with ultimate goal to identify novel candidate genes thereby facilitating early clinical intervention and effective treatment for PID patients.
We perform gene enrichment analysis and construct the gene network module on the expression datasets using implementation of WGCNA software package to find disease-related networks (modules) and disease related hub genes. The functional annotations of these genes with respect to pathways are performed through GeneGo (http://www.genego.com/) pathway analysis software. This will facilitate us to identify the potential PID candidates among the gene network module defining certain PID phenotype in the constructed PID pathways. We intend to perform the pathway analysis using WGCNA and MetaCore - GeneGo software with the constructed pathway datasets and also using publicly available microarray expression datasets.
Thus, we greatly anticipate that the ongoing research works will certainly update and enrich RAPID as an integrated knowledgebase along with other integrated data thereby; it will be soon recognized as a PID ready reckoner for all interested groups including PID clinicians, biomedical investigators and other end users. Here is the brief overview of RAPID road map to this knowledgebase as shown in Figure 3.