Data analysis and processing: creation of new bioinformatics tools
The multitude of new omics projects now coming online will generate an unprecedented amount of ‑omics data. These data are useless unless they can be accurately annotated using standardised codes and classified using up-to-date ontologies.
It is estimated that 80% of rare diseases are of genetic origin. Therefore, genetic databases (patient registries, Locus Specific Databases, dbSNP, OMIM, HGMD, HVP, 1000 Genomes, UCSC, GENATLAS, etc.) play a major role in diagnosis and therapy development for patients with rare diseases and will be utilised in RD-Connect using novel bioinformatics tools.
The multiplicity of players involved in the rare disease field has resulted in the scattering of data in multiple databases, which now need to be cross-linked in order to globally address rare disease understanding and reach a critical mass of information to speed up gene discovery and subsequently drug development.
Although transcripts, proteins and pathways can be predicted based on genetic information to some extent, it is becoming increasingly clear that disease phenotypes are strongly influenced by additional genetic, epigenetic and environmental factors. Therefore, large-scale data related to epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, glycomics, phenomics, secretomics, etc. need to be processed and made available to the rare disease field in a similar way to genomics.
RD-Connect aims to develop highly sophisticated systems able to combine data from these different ‑omics levels in order to facilitate gene and biomarker discovery through efficient annotation systems and expert systems able to extract knowledge from data.
- Clinical bioinformatics leaders in RD-Connect will develop a range of new analysis and sharing tools: for DNA variant selection thanks to a new generation of true pathogenicity prediction tools for multi level ‑omics data integration, thus improving the understanding of genotype-phenotype relationships and biomarker discovery, for decision making in the area of individualised pharmacotherapies as well as new tools to address emerging fields.
- Implementation and sharing of tools as workflows will allow RD researchers to apply command-line tools in a user-friendly environment, in a standardised and reproducible manner, either on the central RD-Connect server or on their own PCs. The automated archiving of analysis procedures as ‘digital materials & methods’ will increase comparability of analysis results and further contribute to data analysis standards.
- A suite of tools will be developed to facilitate cohort selection for trials. The harmonisation of data elements will enable the international use of the RD-Connect platform to enable selection of clearly defined patient cohorts for trials, which has the potential to dramatically speed up recruitment of otherwise hard-to-find patients.
- Integration of facial phenomics: statistically dense, non-invasive 3-dimensional (3D) facial analyses will provide objective fine-grained phenotyping to facilitate diagnosis and delineation and monitoring of patient cohorts. The underlying mathematical approaches permit ready integration with other ‑omics data for the above aims, and for exploration of disease biology.