RD-Connect four years onJanuary 3, 2017
RD-Connect is a global research and infrastructure resource for rare diseases (RD). Set up to overcome the siloing, fragmentation and inaccessibility of datasets from different projects, it links omics data with phenotypic data and information in registries and biobanks at both an individual-patient and whole-cohort level to enable researchers to analyse their own data and gain a complete view of their disease and patient population of interest. Data shared through RD-Connect is accessible beyond the usual institutional and national boundaries and researchers across the world can benefit from the opportunity to work with others with an interest in the same field, relate human phenotypes to a particular gene or pathway of interest, pool data to create larger cohorts, find confirmatory cases, and access samples for further study.
The project’s objectives are to develop:
- an integrated platform to host and analyse data from RD omics research projects
- clinical bioinformatics tools for analysis and integration of molecular and clinical data to discover new disease genes, pathways and therapeutic targets in RD
- common standards and data elements for RD patient registries
- common standards and a sample-level catalogue for RD biobanks
- best ethical practices and recommendations for a regulatory framework for linking medical and personal data related to RD
RD-Connect was launched on 1 November 2012 and this report focuses on the project’s fourth year of operation, from November 2015 to October 2016.
The integrated platform developed by RD-Connect brings together omics data and clinical data from participating projects with tools and services to analyse this data online. The central portal available at http://platform.rd-connect.eu provides access to the genomics analysis interface, the PhenoTips database that stores human phenotype ontology-coded phenotypic profiles for the individuals whose genomic data is accessible in the system, and the catalogue of biobanks and patient registries, known as ID Cards, which provides a publicly accessible searchable interface summarising each resource and the cohorts it contains. The biosample catalogue that allows drill-down to details of individual samples hosted in participating biobanks will be integrated into the main platform IT structure in 2017. Further details about the individual components of the integrated platform are described below.
Genomics pipeline and analysis interface
RD-Connect’s mechanism for sharing and analysis of rare disease genomic data begins with submission of the raw .bam or .fastq files, which is essential in order to allow data from multiple sequencing providers to be processed through a standard pipeline to ensure comparability. The raw data are stored for long-term access at the European Genome-phenome Archive (EGA), a secure, controlled-access repository, while the processed data are made accessible online for real-time analysis in the RD-Connect genomics analysis interface. The standard analysis pipeline for aligning, variant calling and annotating raw exome and genome data has been applied to 1142 samples to date (November 2016) and the fully annotated gVCF results have been made available to authorised users through the online interface. Thanks to the initial collaboration with the two partner projects, NeurOmics and EURenOmics, the platform is a rich resource containing whole exomes and genomes of a large number of individuals with rare neuromuscular and rare renal disease, but it is also growing rapidly for other RD areas such as mitochondrial, neurogenetic and immunological disorders. Several thousand more datasets are planned for submission in the coming year, now that the platform is open for submission of data from all rare disease projects.
During the fourth year of the project, the genomics analysis interface has become a mature tool for diagnosis and gene discovery. The user-friendly online system allows users to analyse and query their own data as well as data submitted by others that has been made accessible to authorised users following a predefined embargo period that gives researchers priority access to their own data. A researcher can select one or multiple individuals (e.g. trios or other family relationships) to study and can then filter and refine the results by mode of inheritance, population frequencies such as ExAC and GnomAD, in silico pathogenicity prediction tools and gene lists, amongst others. After performing the initial filtration and displaying the results, extensive further details are provided for each variant. A new update in 2016 also allows users to search by gene across all cases in the system in order to find others in which the same gene is affected – this is useful for “matchmaking” to find confirmatory cases for gene discovery and also allows researchers with an interest in a particular gene, for example basic scientists working on a particular gene in an animal model, to find corresponding human cases in which the gene is affected. A tool for establishing runs of homozygosity has been tested and has picked up potential consanguineous cases, including a number that were not flagged as such by the submitter, and this is currently undergoing validation to see if it provides further power for gene discovery in such individuals. The Exomiser tool to prioritise variants based on phenotype and pathogenicity inference has been implemented in the development version of the interface and will be implemented in the production version in early 2017. The platform has already played a critical role in the discovery of several new RD genes and phenotypes which have been published in top-level peer-reviewed journals. A straightforward, secure registration process has been set up that allows researchers around the world to carry out research on the platform free of charge.
The platform developers continue to collaborate actively with other international data sharing initiatives, including the Global Alliance for Genomics and Health’s “beacon” API, which enables querying of the presence or absence of single variants in a population, and Matchmaker Exchange, designed to assess similarity between two individuals in different databases and find confirmatory cases. Outreach to additional projects has resulted in numerous new collaborations further described in the impact section below that will bring new datasets to the platform. The collaboration with BBMRI-LPC is worth mentioning in this regard as a paradigm of good practice for European rare disease data sharing: researchers across Europe were provided with exome sequencing at no cost through a transnational access mechanism, but the project conditions mandated that biosamples must be made accessible through EuroBioBank biobanks and phenotypic data must be submitted to the RD-Connect PhenoTips instance, while the resulting sequencing data will be automatically submitted to the RD-Connect platform. This workflow not only allows the researchers themselves to analyse their own cases but also ensures that the samples and data will be accessible to others in future, thus maximising the added value of the project for future research. The datasets from this project, numbering almost 1000 cases, will be accessible within the genomics analysis interface from January 2017.
Biobanks and patient registries
The 2016 activities relating to patient registries have focused extensively on enabling data linkage across resources and on the principle of making registry data Findable, Accessible, Interoperable and Reusable (FAIR). This approach is gaining significant traction internationally through the European Open Science Cloud and NIH Commons, and the rationale behind its application in the patient registry context is that where data elements stored in registry and biobank databases are mapped to reference ontologies and made accessible for external queries, this allows computers to assist in analysis that otherwise tends to be either impossible due to incompatibilities between datasets, or else requires manual data aggregation and labour-intensive interrogation of the data. A successful proof of concept showing that it is possible to enable questions across biobanks and registries using an approach based on making data machine readable at the source was demonstrated at the 2016 annual meeting. In recognition of the importance of this approach, a comprehensive cross-project data linkage plan has now been developed together with ELIXIR and BBMRI. It encompasses making at least seven more resources interoperable at the source and developing tools to facilitate the procedure. A toolkit for new registries including advice on interoperability aspects is under development, and the providers of a number of software solutions for patient registries are engaged with this process, receiving support from data linkage experts in making their solutions into FAIR data points that can expose their data for queries in a secure fashion. Several “Bring Your Own Data” workshops including interoperability experts from ELIXIR have enabled registry owners to understand the principles and importance of interoperability and how they can implement these principles for their own data, while a hackathon with the software developers enabled significant progress to be made on the technical side of fair data point creation. This will be continued in 2017 through the implementation of the data linkage plan.
A second focus has been on improving the visibility of rare disease biobanking and registry resources in order that researchers wishing to locate a particular biosample or patient cohort know where to look. To this end, RD-Connect has developed two complementary systems: ID Cards, which is a catalogue providing information about existing registries and biobanks across Europe and beyond, and the sample catalogue, which is a database providing searchable access to biosample records at an individual sample level. Currently, the ID-Cards system holds data on 333 rare disease registries, 170 of which display not only high-level information about the resource itself but also a more detailed “disease matrix” containing aggregate data about the cases it contains. To improve visibility of registries related to the new European Reference Networks for Rare Diseases, each registry has been assigned to a disease group and ERN group, which should facilitate locating the cohorts relevant to each ERN.
Within RD-Connect’s biobanking activities, following the launch of the sample catalogue in beta release, technical activities this year focused on importing data from EuroBioBank and integration with the ID-Cards system and the main platform, including preparation for the hosting of both systems within the main platform IT infrastructure in Barcelona by creating docker containers for deployment in the Barcelona cluster and integration with the Centralized Authentication Service. A particular emphasis was placed on the sustainability of the tools and therefore discussions were initiated to ensure the tools under development are interoperable, in particular with the tools being developed by BBMRI-ERIC IT Common Service. A further major achievement is the agreement between RD-Connect and EuroBioBank that EuroBioBank will be the de facto biobank network for RD-Connect. While retaining its own identity, EuroBioBank will be fully integrated with RD-Connect services, including the ID Cards and sample catalogues, the quality management procedures and material transfer standards. The first joint EuroBioBank – RD-Connect Meeting took place during the 2016 Annual meeting in Barcelona, and the working groups to oversee the integration plan were agreed during this Assembly. This partnership with EuroBioBank also signifies the first recruitment of RD biobanks to contribute sample-level data to the Sample Catalogue. The workflow for inclusion of other RD biobanks has also been developed. As part of this effort, the 5-member RD-Connect Panel of Biobank Assessment was established and activated, and the assessment questionnaire was reviewed and the revisions have been implemented.
In addition to the central resources offered through the platform interface, RD-Connect partners have also developed a number of bioinformatics tools to assist researchers in omics analysis and therapeutic target identification. At this point in the project, the tools are continuing to be further developed and their features improved and extended. In terms of variant analysis, in 2016 improvements have been made to the UMD-Predictor, Human Splicing Finder (HSF) and VarAFT software suites. The first of these is a tool to predict the pathogenicity of any exonic substitutions, the second enables pathogenicity prediction of any gene mutation at the mRNA level, while the third is a variant annotation and filtration tool that can be run on VCF files hosted locally. Interactive Biosoftware’s ALFA system for annotating variants in regulatory regions of the genome has also continued to be developed and is available through the RD-Connect platform. Additional tools have been developed to facilitate antisense oligonucleotide selection for exon skipping therapeutic strategies through the SKIP-e tool, and to provide information about gene-drug interactions through the ePGA (electronic Pharmacogenomics Assistance) system. Work on omics data integration has also continued via exemplar integration projects for pathway and network analysis, and through the joint multi-omics task force established across RD-Connect, NeurOmics and EURenOmics, the goals of which are to share experiences, approaches and workflows in multi-omics integration projects, organize data workflows, use of common identifiers and harmonization of metadata, and carry out joint work on showcase projects.
Ethical, legal and social issues and patient involvement
ELSI experts within RD-Connect have been actively engaged in establishing the ethical framework under which RD-Connect can enable sharing of sensitive human data in a secure and ethical fashion. Building on the preceding work within the project, a Code of Practice based on legal requirements and ethical principles as well as patient and scientific needs has been drafted for RD-Connect and was rolled out for new researchers wishing to make use of the system. All principal investigators requesting access to the RD-Connect platform must accept the Code of Practice on their own behalf and that of their team by signing an Adherence Agreement. While established specifically for RD-Connect, the Code of Practice has relevance for other disease areas, and a practical guide has been produced to provide support to scientists involved in or setting up collaborative projects across national borders. The Code may also be of relevance in relation to the application of the new EU General Data Protection Regulation (GDPR), and RD-Connect will participate in a BBMRI-led consortium to develop a Code of Conduct related to the GDPR. Synergies with related projects have been exploited wherever relevant, since many ELSI issues faced within RD-Connect are relevant to all projects dealing with human data and in particular genetic data. Together with the BBMRI ELSI Common Service, ELSI leaders in RD-Connect have prepared guidance for researchers on the GDPR and how it will affect research with human data and samples. In conjunction with the European Academy of Bolzano and the EU COST Action CHIP Me, an international workshop entitled “Genetic data in public research databases: which governance mechanisms should apply?” explored ethical and legal challenges that arise when researchers are required to deposit genetic and genomic research data in public research databases as well as investigate governance mechanisms that may support ethically and legally compliant data deposition. A second workshop co-organised with CHIP Me dealt with the challenges of public-private partnerships, taking rare diseases as a case study to explore potential pitfalls and establish best practice.
Input of patient representatives into RD-Connect activities is managed by EURORDIS through the Patient Advisory Council and Patient and Ethics Council, which have been highly active throughout the project to date and provided valuable guidance on the project’s direction, particularly in ethically challenging areas relating to data sharing where risk and benefit must be carefully evaluated. To enable patient representatives to have a more direct and visible input into RD-Connect activities, in 2016 members of the Patient Advisory Council have been nominated to engage with each of the RD-Connect technical work packages. This not only enables the technical experts to have direct input from the PAC, but also strengthens the commitment and engagement of the PAC members, supports capacity building, and improves dissemination of the project’s outputs to the wider rare disease patient community. Downstream communication of the project’s activities will also further improve with the launch of a dedicated section for patients on the website that is developed and managed by patients themselves. The two-way exchange of information extends beyond RD-Connect, with regular participation of patient representatives in other European consortia and networks to ensure that rare disease patients’ needs are integrated within the development of best practices in RD research. This includes activities such as the development of the EU Platform on Rare Diseases registration by the JRC and the BBMRI stakeholders’ forum.
Impact, outreach and extending collaborations
At the end of its fourth year, RD-Connect is a recognised resource within the rare disease genomic research community. The maturity of the central RD-Connect platform as a tool for genomic analysis has facilitated many new data-focused collaborations in 2016. Poster and platform presentations at a number of international conferences, both disease-specific and genetics-focused, have encouraged researchers to submit data to the system, and around 2000 further datasets from new projects are set to arrive in early 2017. While most of the data in the platform to date has previously been analysed with other tools, many of the new datasets will be analysed through the RD-Connect analysis interface for their primary analysis, and the results of this activity in terms of diagnostic outcomes, gene discovery results and user satisfaction will be of crucial importance to cement the place of RD-Connect as a useful tool for rare disease genomics research.
Interactions with the ESFRI research infrastructures, in particular BBMRI and ELIXIR, have been further strengthened in 2016. BBMRI-ERIC is now a full partner of RD-Connect, which is a valuable means of ensuring synergy in overlapping activities relating to the existing BBMRI common services for ELSI and IT, as well as for working on sustainability options for RD-Connect’s biobanking resources through a potential BBMRI common service for rare disease due to be discussed in 2017. Thanks to the fact that a large number of RD-Connect partners are also ELIXIR members who are active in the ELIXIR-EXCELERATE project, many activities relating to the rare disease use case and the interoperability and training work packages are carried out together with RD-Connect, and this has been extremely beneficial for both sides.
In the public health sphere, the establishment of European Reference Networks, international networks of healthcare providers recognised by their own member states as centres of expertise for specific conditions, has the potential for substantial synergy with RD-Connect. While ERNs have primarily a healthcare focus, they must also establish research aims, and the majority of ERN expert centres do themselves have substantial research and diagnostic activity and would value the opportunity to collaborate with RD-Connect to share research-related clinical, biosample and omics data. RD-Connect has offered all ERN coordinators the opportunity to work together on such activities and will make its services available to all ERNs successful in the first call (results expected in December 2016). On a related note, RD-Connect has also offered its ID Cards patient registry catalogue to be used as part of the services offered by the European Commission’s Joint Research Centre in Ispra, which has been tasked with providing patient registration activities for the Commission. This has the dual benefit of making use of an established resource rather than duplicating effort by developing a new one, and also potentially providing a sustainability mechanism for the catalogue in future.
Sustaining the infrastructural resources generated through RD-Connect has been an important topic in 2016 and must continue to be consolidated in the coming year. RD-Connect partners are involved in or leading a number of upcoming grant applications in the H2020 Health and Infrastructures work programmes, some of which may provide sustainability for elements of the central platform and data linkage resources. Other sustainability mechanisms are being actively pursued, including fee-for-service and cost-recovery models, as well as deeper integration with the ESFRI research infrastructures where appropriate.
RD-Connect has continued to interact closely with the IRDiRC, with RD-Connect coordinator Hanns Lochmüller chairing the IRDiRC Interdisciplinary Scientific Committee and a number of RD-Connect partners contributing to IRDiRC committees and engaging with various task forces, including the joint IRDiRC-GA4GH task force on privacy-preserving record linkage, which aims to enable the linking of datasets on the same individual across different databases without revealing the individual’s identity. At EU level a number of RD-Connect partners are also engaged in the ongoing discussions relating to potential new funding mechanisms for rare disease at an EU level, and here we are strongly advocating not only the use where appropriate of existing infrastructures such as RD-Connect and the ESFRI research infrastructures, but also of establishing the primacy of data sharing and interoperability as one of the cornerstones of the initiative.
Overall, at the end of its fourth year, RD-Connect is consolidating its position as a key player within the rare disease genomics field. 2017 must see further use of the platform by new projects, including further successes in terms of gene discovery from its use as a primary analysis interface. The successful implementation of the data linkage plan to enable cross-resource queries through making databases interoperable at the source is another important goal that will drive forward interoperability of patient registries across the field, while the full launch of the sample-level catalogue for biosamples will be an exciting new resource for the RD research community.