30 January 2015
Funded as part of the EU’s commitment to the International Rare Diseases Research Consortium (IRDiRC), RD-Connect is a global infrastructure project that links omics data with databases, registries, biobanks, and clinical bioinformatics tools into a central research resource for rare diseases. It has been set up to address the problem of fragmentation in the rare disease research field, where individual efforts often have poor interoperability and do not systematically connect data across the levels of clinical phenotype, genetics and omics data, biomaterial availability and research/trial datasets. This data must be linked at both an individual-patient and whole-cohort level to enable researchers to gain a complete view of their disease and patient population of interest.
RD-Connect’s primary objectives are to develop:
- an integrated platform to host and analyse data from omics research projects
- clinical bioinformatics tools for analysis and integration of molecular and clinical data to discover new disease genes, pathways and therapeutic targets
- common standards and data elements for rare disease patient registries
- common standards and a sample-level catalogue for rare disease biobanks
- best ethical practices and recommendations for a regulatory framework for linking medical and personal data related to rare disease
RD-Connect was launched on 1 November 2012 and this report focuses on the second year of operation. At the end of its second year, RD-Connect has successfully achieved its objectives for the period and has made significant progress on the development of the central platform for omics data and the associated bioinformatics tools for data analysis, the integration of biobanks and registries, and the ethical framework for data sharing.
RD-Connect is initially incorporating data generated by EURenOmics, which uses omics approaches to focus on causes, diagnostics, biomarkers and disease models for rare kidney disorders such as steroid-resistant nephrotic syndrome and tubulopathies; and NeurOmics, which uses omics approaches to improve diagnosis and treatment of rare neurodegenerative and neuromuscular disorders such as Huntington’s disease and muscular dystrophies. The strong collaborations with EURenOmics and NeurOmics set up in Year 1 have continued and these ensure that RD-Connect is developing in a direction that enables the platform and tools to be of utility to all associated projects generating omics data.
The integrated platform developed by RD-Connect brings together omics data from participating projects with tools and services to analyse this data online. In 2013, joint task forces with participants from RD-Connect, NeurOmics and EURenOmics were established to take forward various platform-related activities, and these have delivered tangible results in 2014. In particular, the first NeurOmics submission of whole-exome sequencing data to RD-Connect has been completed and has undergone full data processing and analysis within RD-Connect as a pilot case to refine the pipeline and workflow. In line with the workflow established in agreement with NeurOmics and EURenOmics, raw sequencing data (in BAM format) was submitted by NeurOmics to the European Genome-phenome Archive (EGA) and the corresponding deep phenotypic data was submitted to a dedicated NeurOmics PhenoTips database set up as a collaboration between RD-Connect, NeurOmics and the PhenoTips team. RD-Connect processed the data through the initial version of the Standard Analysis Pipeline (SAP). A pilot set of exomes (43 exomes from 20 solved and unsolved cases) were aligned, variant called and annotated. The results were discussed and validated during the 2014 data analysis jamboree. Using the RD-Connect system the causal variants were identified for all the already solved cases and candidates were proposed for some unsolved cases. NeurOmics are now routinely uploading raw data to EGA and phenotypic data to PhenoTips and a second batch will be processed through the SAP in January 2015. EURenOmics are currently in the process of testing the EGA upload for their purposes and EURenOmics data will be integrated into the platform in 2015.
The functionality to be available in the first release of the primary data platform’s web interface has been established, with a focus on information useful for variant prioritisation. Substantial progress has been made on the platform’s underlying technical infrastructure and the design and development of the data repository and a beta version will be tested by NeurOmics in January 2015, further developed during a hackathon in February 2015, and presented during the joint NeurOmics-RD-Connect annual meeting in March 2015 and the EURenOmics meeting in April 2015. Feedback from end-users will then be used to further refine the systems before the platform is made available to a wider community. The repository is being deployed on a physical 5 node Hadoop cluster and VCF files resulting from the Standard Analysis Pipeline can be uploaded and indexed through ElasticSearch, enabling quick access and queries.
In terms of broader data linkage and interoperability, initial steps have been taken with NeurOmics and EURenOmics to enable incorporation of metabolomic, transcriptomic and proteomic data generated by these projects. Work towards enabling data linkage with biobank sample data and registry phenotypic data is ongoing and will progress further after the annual meeting in March 2015. Work on data interoperability principles has been taken forward through the Linked Data and Ontologies Task force led by LUMC and UAVR, and specific work has been done to improve nanopublication schemes and provide proof-of-concept examples of this approach, showing its utility to enhance discovery and link data.
In terms of bioinformatics tools and annotation sources, new versions of a number of tools across the various analysis suites have been released. The new version of UMD-Predictor®, based on the hg19 version of human genome and on Ensembl v71 gene annotations, has been developed into one optimized PostgreSQL database for all chromosomes. A website is now available with a user-friendly interface. At the same time, the new version of Human Splicing Finder, HSF v3.0, has been released to allow predictions of the impact of mutations on splicing signals. Both systems have demonstrated very high efficiency and publications are underway. In parallel, a document summarising the state-of-the-art regarding currently available data sources and algorithms for annotation of extra-genic and non-coding variations has been produced. This has resulted in the development of the ALFA system for the identification of key regulatory signals including promoter regions, transcription factor binding sites, enhancer regions, CTCF binding sites or CCCTC binding sites (insulators) or microRNA target sites that can be disrupted by mutations.
Web service operations using Concept Profile Analysis technology have been developed that allow retrieval of literature-based fingerprints for lists of genes, proteins and chemical substances from -omics experiments. A proof-of-concept matching experiment showed expected results for a match with the causative gene for Huntington’s Disease (HD). A strategy is now being designed for network-based integration of model organism and human data, integration of microarray data from human HD and control brains, and NGS-based transcriptomics data from blood from HD patients and controls.
An electronic ‘pharmacogenomics assistant’ (ePGA) to provide personalised drug recommendations based on linked genotype-to-phenotype pharmacogenomics data and support biomedical researchers in the identification of pharmacogenomic related gene variants has been developed. A pilot project on the semantic integration of clinical (phenotype) and genomic (genotype) information from the Australian Skeletome and DMD registries is ongoing, and the COEUS Semantic Web application framework is being further developed and also utilised in a number of specific use cases. The SKIP-e bioinformatics system has been developed to assist researchers in selecting antisense oligonucleotides that can be used as therapeutic agents to restore the production of a functional deleted protein such as for DMD or as tools to study gene function through the inactivation of a protein through nonsense mediated decay. In 2015 work will continue to make these tools and annotations accessible through the central platform.
Biobanks and patient registries
In 2014 an online searchable catalogue for biobanks and registries has been developed that provides information about the data held by these resources. The catalogue is made up of “ID-Cards” with information on the participating registries and biobanks and the data they hold. In order to encourage participation, the RD-Connect registry group has created ID-Cards for 97 RD registries linked with RD-Connect. For biobanks, the system will be integrated with a searchable database containing sample-level data on rare disease biosamples from participating biobanks. The system is being integrated into the online registration and validation procedure for biobanks wishing to participate in RD-Connect and will be rolled out to biobanks in 2015.
Regarding phenotypic data collection, the Human Phenotype Ontology (HPO) is accepted as the primary phenotype ontology in use in RD-Connect and associated projects, with the Orphanet Rare Disease Ontology (ORDO) used as a standardised nomenclature for diseases. However, it is also of value to understand the broader range of phenotype ontologies and common data elements in use in the rare disease field, and extensive work has been carried out in this area, with several reports being published. Based on scoping work carried out in collaboration with other projects working with undiagnosed diseases, including the NeurOmics project, the Canadian Care for Rare Consortium and the NIH Undiagnosed Diseases Program, an initial set of Standard Operating Procedures for undiagnosed cases were produced, based on existing best practice from these initiatives.
The online registration procedure and assessment workflow for incoming biobanks have been finalised, the biobanking operational workflow has been defined and training materials are being drafted. The data model for the searchable sample-level database for biosamples based on the “Minimal Information Standards” for sample collections (MIABIS) has been created and is in the process of being tested with data from contributing biobanks. Technical discussions regarding integration of this sample-level data with the omics data in the central platform have been held and work on this will progress after the annual meeting in March 2015.
Outreach: IRDiRC integration and extending collaborations
To improve integration and avoid duplication, efforts have continued in the second year of the project to connect with other groups engaged in related activities that were not part of the original RD-Connect consortium and enhance our outreach to other countries. RD-Connect, NeurOmics and EURenOmics held their 2014 annual meetings jointly in Heidelberg, Germany and RD-Connect and NeurOmics will hold their 2015 meetings back-to-back to allow a joint session to take place. As recommended by RD-Connect’s Scientific Advisory Board (SAB), chaired by Bartha Knoppers, close links with EURenOmics and NeurOmics were followed up on several levels to ensure integration of activities: joint meetings of the SAB, regular calls between the project coordinators, integration of NeurOmics and EURenOmics researchers in beta-testing of bioinformatic tools and RD-Connect pipeline, joint training, joint Project Ethics Council (PEC), and joint work on dissemination and impact. RD-Connect has also built on the collaborations established in the first year particularly with associated partners Peter Robinson (Human Phenotype Ontology, Berlin), Michael Brudno (the PhenoTips and PhenomeCentral software tools, Toronto) and Morris Swertz (the Genomics Coordination Center at Groningen, Groningen) and new associated partners have been included within the consortium to reflect their desire to collaborate and provide data. A Memorandum of Understanding has been set up to formalise such collaborations in future.
In its second year of operation RD-Connect has also reached out to numerous other projects generating rare disease omics data and after the central platform becomes operational in mid-2015 it will host data from multiple sources, which will greatly increase its utility. This collaboration with additional projects operating in the same field will enable researchers to better address shared challenges in the field of rare disease research:
- establishing and providing access to harmonised data and samples
- performing the molecular and clinical characterisation of rare diseases
- boosting translational, preclinical and clinical research
- streamlining ethical and regulatory procedures
In line with these goals, RD-Connect continues to interact closely with the IRDiRC, with RD-Connect coordinator Hanns Lochmüller chairing the IRDiRC Interdisciplinary Scientific Committee and a number of RD-Connect partners being members of IRDiRC committees and working groups and providing input into the IRDiRC’s “gap analysis” and “roadmap” activities in 2014. Additionally, RD-Connect is aligning activities with the Global Alliance for Genomics and Health (GA4GH), which was established in 2013 as a globally collaborative effort aiming to build on existing best practices and approaches to enhance secure and responsible sharing and interpretation of genomic and clinical information. Its working groups span data standards and interoperability, clinical genomics, ethics and regulatory issues, and data security, all of which are relevant to RD-Connect aims. In particular, RD-Connect is active (together with NeurOmics participants) in the Clinical Working Group, and is engaged with the Matchmaker Exchange project to develop an API enabling matching of patients in different systems based on genomic or phenotypic similarity.
Ethical, legal and social issues and patient involvement
The advances taking place in RD-Connect will improve omics research in rare disease but also have societal and ethical implications, in particular for patients with rare diseases. To address these, research work has been carried out on a number of key topics including incidental findings, informed consent, sharing of data and biospecimens and the use of personal unique identifiers.
Regarding data sharing issues, a proposal was developed for an expedient regulatory framework for linking of medical and personal data related to RD on a European and global level. A paper (Mascalzoni et al. 2014) that explores the regulatory existing framework has been published in the EJHG. This paper addresses the issue of data and sample sharing and provides both the ethical foundations on which data sharing should be based (Sharing Charter) as well as a general legal Material and Data Transfer Agreement (MTA/DTA). This first stage ethical framework is meant to provide a basis to ensure uniformity of access across projects and countries, and may be regarded as a consistent basic agreement for addressing data and sample sharing in RD-Connect.
A stakeholder meeting on informed consent took place in Rome in April 2014 and resulted in guidance on best practice for consent for data sharing. The Patient and Ethics Council (PEC), which acts as an advisory body to the Governing Board, provides a mechanism for collaboration between RD-Connect, Neuromics and EURenOmics and allows all three projects to offer comment on the ELSI aspects of the project. The PEC submitted a response to the Council of Europe’s public consultation ‘working document for research on biological materials of human origin’.
Together with the PEC, a joint Patient Advisory Council is also proving an important mechanism to structure the input of patient perspectives and expectations in the project. In particular, feedback was received regarding the expectations and concerns of patients regarding the data linkage and use of unique identifiers. In May and June focus groups with rare disease advocates were carried out to explore the participants’ views, opinions and experiences around the sharing and exchange of data/specimens for research. A range of issues were identified and findings from the focus groups will be submitted to an academic journal in the next 6 months.
This advisory group has also provided a platform to build the capacity of patient representatives on the technical, legal and ethical issues surrounding such a project. Members of the council have regularly created awareness of these issues and the project as a whole to their constituents and other stakeholders with which they regularly work.
Further analysis on the ethical hurdles in RD Research focused on the issue of return of results. The use of WGS/WES in research and the clinic blurs the divide between the clinic and research and, at the same time, challenge the framework(s) traditionally used for obtaining informed consent and specifically regarding return of results. A report presents the current situation in the discussion on return of information and results in the literature. In the report there is special consideration for issues related to RD patients.
A complete ethical framework will be developed as a result of this work, which includes both theoretical work and stakeholder involvement as well as emerging issues coming from the consortium’s needs including involvement of children in research and risk-based ethical evaluation.
In its second year of operation, RD-Connect has made substantial progress on the development of the central platform for omics data and the associated bioinformatics tools for data analysis, the integration of biobanks and registries, and the ethical framework for data sharing. To ensure dissemination of project results, RD-Connect was presented at numerous national and international conferences including major events such as the American Society for Human Genetics (ASHG) and European Conference on Rare Diseases & Orphan Products (ECRD) and the project has now been acknowledged in over 50 peer-reviewed publications. A publication describing RD-Connect’s concept and objectives was published in the Journal of General Internal Medicine (Thompson et. al. J Gen Intern Med. 2014). A monthly newsletter is published online and sent out to over 500 subscribers and the RD-Connect Twitter feed has nearly 500 followers. Over the course of the year, the coordination team has continued to work towards disseminating and communicating the outputs from RD-Connect to the wider rare disease community. To date, we have produced 10 newsletters which contain information about RD-Connect activities as well as other activities of relevance to RD-Connect partners, IRDiRC, NeurOmics and EURenOmics. All publications have been disseminated via the RD-Connect website, newsletter and twitter feed. Training activities have been organised at the joint project meeting in Heidelberg, and training materials can be found on the RD-Connect website (http://rd-connect.eu/events/training/). A joint project communications team between RD-Connect, NeurOmics and EURenOmics has been established. Linking up with related initiatives continues to be considered crucial to the project’s future impact, success and sustainability, and efforts in this regard have continued in 2014. RD-Connect partners are involved in several Horizon 2020 project proposals and participating in the Global Alliance for Genomics and Health (GA4GH), particularly in the Clinical Genomics Working Group and the Matchmaking Exchange API project. Important links with the European Research Infrastructures (RIs) on the ESFRI roadmap have been initiated in 2014, and links will be further pursued in 2015.
NeurOmics and EURenOmics
To find out what NeurOmics and EURenOmics have been up to over the past year please follow the links below: