RD-Connect three years onMay 23, 2016
RD‐Connect is a global research and infrastructure resource for rare diseases. Set up to overcome the siloing, fragmentation and inaccessibility of datasets from different projects, it links omics data with phenotypic data and information in registries and biobanks at both an individual‐patient and whole‐cohort level to enable researchers to analyse their own data and gain a complete view of their disease and patient population of interest. Data shared through RD‐Connect is accessible beyond the usual institutional and national boundaries and researchers across the world can benefit from the opportunity to work with others with an interest in the same field, pool data to create larger cohorts, find confirmatory cases, and access samples for further study.
The project’s objectives are to develop:
an integrated platform to host and analyse data from omics research projects
clinical bioinformatics tools for analysis and integration of molecular and clinical data to discover new disease genes, pathways and therapeutic targets
common standards and data elements for rare disease patient registries
common standards and a sample‐level catalogue for rare disease biobanks
best ethical practices and recommendations for a regulatory framework for linking medical and personal data related to rare disease
RD‐Connect was launched on 1 November 2012 and this report focuses on the project’s third year of operation, from November 2014 to October 2015. The end of the third year marks RD‐Connect’s mid‐way point as an EU‐funded project and is an opportunity to reflect on the successes of the project to date and its priorities for the coming years.
The integrated platform developed by RD‐Connect brings together omics data from participating projects with tools and services to analyse this data online. During the third year of the project, extensive progress has been made in the functionality of the platform. The user interface portal has been launched at https://platform.rd‐connect.eu and currently provides access to the genomics platform, the PhenoTips database that stores phenotypic profiles for the individuals whose genomic data is accessible in the system, and the ID‐Cards catalogue that provides information about patient registries.
The launch of the genomics platform as a fully functional beta release in early 2015 was a major milestone for the project. At year end, with 567 whole‐exome datasets hosted in the genomic data repository and accessible through the secure online analysis interface, the platform is maturing into a robust and useful analysis platform for the genomic data it holds. The data repository has been designed with the size and complexity of whole‐genome data in mind and uses big data technologies such as Hadoop and ElasticSearch to index the data in a form that allows rapid real‐time queries. The system has a Scala API for queries and a sophisticated graphical user interface written in Angular.js that allows users to interact with the platform to perform queries on the data. Users can select one or multiple individuals (e.g. trios or other family relationships) to study and can then filter and refine the results by mode of inheritance, annotation, population frequencies, pathogenicity and in silico predictions and gene lists, among others. After performing the initial filtration and displaying the results, extensive further details are provided for each variant. This includes both annotations within the system and additional data available on the fly through webservices or links. For example, the platform connects to the DiseaseCard and ALFA webservices to return additional detail that aids in the interpretation of variants and displays the corresponding information in real‐time. It also provides links to external sources such as the ExAC, UCSC, Ensembl and NCBI browsers to view the variant in its genomic context in other populations, and to OMIM and dbSNP which provide reports on gene function and variants reported.
In terms of external access and querying, RD‐Connect has implemented the Global Alliance for Genomics and Health’s “beacon” API and is included in the Network of Beacons (https://beacon‐network.org), which means that the system can be queried externally for the presence or absence of an individual variant in RDConnect’s population, returning a yes or no answer. The platform developers are also actively engaged in further developing the Global Alliance’s Matchmaker Exchange API, designed to assess similarity between two individuals in different databases and find confirmatory cases. In partnership with the GENESIS system from the University of Miami, which also holds whole‐exome datasets and phenotypes coded with the human phenotype ontology, RD‐Connect developers are working to extend the API to be able to cope with the “noise” of the huge numbers of variants in every individual by filtering and prioritizing on the fly during the query process to avoid creating matches with benign variants.
To enable more rapid progress in platform development, in 2015 several small focused hackathons were organised on platform architecture, web‐service integration, ID mapping and security, while the developer conference calls every two weeks served to improve collaboration between the developer teams working at different partner institutions and to keep everyone up to date on the technical progress.
All of the developments have been made possible thanks to continued close collaboration with RDConnect’s partner projects NeurOmics and EURenOmics, whose researchers have provided the use cases that the developers have worked with and whose testing of the system and extensive feedback has been used to refine the platform user interface and its functionality. At present all data in the system is from the NeurOmics project, and NeurOmics users have successfully used the platform to identify causative variants in the whole exome data of their sequenced patients.
Roll‐out to users beyond NeurOmics was not possible until ethical approval for the database had been obtained by its host institution in Spain, and until granular security and user authentication for the system had been set up. Ethical approval has now been obtained, and with the integration of a Central Authentication Server (CAS) with LDAP authentication and single‐sign‐on set for the end of 2015, the required measures are in place to accept data and enrol users from other projects, and this is scheduled to begin in Q1 2016. Outreach to additional user groups has already begun, with a platform presentation at the European Society of Human Genetics annual conference and many other presentations and personal interactions with data‐generating projects. This must now be consolidated and the additional “promised” datasets (now approaching 2000 individuals) transformed into a reality in 2016.
Aside from the user interface of the platform itself, RD‐Connect aims to develop suites of advanced clinical bioinformatics tools to extract knowledge from high throughput experiments, clinical registries and biobanks. Key deliverables have been released at M36 to help the community to more efficiently analyse NGS data and potentially discover new genes responsible for genetic diseases, in particular to annotate variants with an effect on the amino acid sequence (UMD Predictor), splicing (HSF) and regulatory regions (ALFA). These tools are now available as stand‐alone applications, but also as part of the VarAFT system and integrated into the RD‐Connect genomics platform. Additional tools have been developed to facilitate drug design, clinical trial feasibility and patient enrolment for the nonsense read‐through therapeutic strategies that could be applied to any rare disease for which patients harbour a premature stop codon. Together with researchers from NeurOmics and EURenOmics, workflows have been developed to integrate transcriptomics, metabolomics and proteomics data, and semantic tools were used to uncover implicit gene:disease and gene:phenotype associations that will to be used for knowledge discovery and automated queries from the RD‐Connect Genomics Analysis Platform.
Biobanks and patient registries
In terms of centralised resource creation, the biobanks and registries work within RD‐Connect has focused on two complementary systems: ID Cards, which is a catalogue providing information about existing registries and biobanks across Europe and beyond, and the sample catalogue, which is a database providing searchable access to biosample records at an individual sample level. In Year 3 of the project both systems have been further developed and have also successfully become mutually linked. The ID‐cards system has been populated with additional registries in order to provide overview information about the resource and aggregate data on patient numbers. For biobanks the ID‐Card invitation system was successfully piloted with two biobanks.
The data model for the sample catalogue, based on the “Minimal Information Standards” (MIABIS) data model for sample collections, was refined in collaboration with BBMRI‐ERIC and collaborating biobanks from EuroBioBank. Four biobanks participated in the test phase for the sample catalogue and provided sample datasets to the developer team to allow the system to be refined and methods for data transfer established. Further partner biobanks will test the sample catalogue in Q1 2016 prior to its full release to external biobanks. In line with recommendations from the Scientific Advisory Board, the sample catalogue will be developed in a way that allows “offline” biobanks to update the central catalogue by data upload, while “online” biobanks will have a data federation option that would allow live querying if their systems are able to implement an API to enable this. The data upload option is in the process of being tested, while progress on the common API for data federation will continue in 2016.
To avoid duplication of effort and data, interoperability between the sample catalogue and the ID‐card system was established. The sample catalogue is now able to retrieve all the information on the biobank from the ID‐card, while the disease matrix of each biobank in the ID‐Card system can be automatically populated with aggregated data retrieved from the sample catalogue.
In the third year the major work with regard to patient registries has also focused on interoperability, with registry systems receiving training and support in making their data interoperable during the “bring your own data” workshops hosted by RD‐Connect jointly with ELIXIR. As a demonstrator or proof of concept, linked data experts began working with end‐users on a use case that requires data to be linked across registries and biobanks. This demonstrator will be ready to present at the 2016 annual meeting and is a prime example of successful collaboration with external data experts from ELIXIR.
Ethical, legal and social issues and patient involvement
In Year 3 progress has been made towards the completion of several tasks that will eventually define the project’s ethical framework. As a sign of the broader acceptance of the ELSI activities undertaken within RD‐Connect, the International Charter of Principles for sharing bio‐specimens and data published in the European Journal of Human Genetics during the previous reporting period has now received the “IRDiRC Recommended” status from the IRDiRC Executive Committee, and a follow‐up paper on consent for effective research has recently been accepted for publication in the same journal. A survey on the feasibility of new consent and re‐consent models has been developed to further explore how to best suit the interests of the consortium.
Results arising from the interaction with relevant stakeholders through workshops and from extensive literature review have led to development of informational reports. The first of these lays out practices for the involvement of children in longitudinal research, especially with regard to assent and consent. Further research on the pre‐manifesting carrier status communication has been performed and led to a first stage report that will be further developed into a publication.
A matter of immediate practical relevance that it was crucial to resolve in this reporting period was that of the ethical and legal implications of access to the data hosted within the RD‐Connect platform. Here, partners from the ethics and platform workpackages worked together to gain local ethics approval for the central platform data repository and to develop a Code of Practice and Adherence Agreement regulating all aspects of access to the platform and ensuring that users who receive authorisation for access to data within the platform fully understand their responsibilities and obligations. With these two key milestones in place, the platform is now in a position to accept data and user registrations from other projects in Q1 2016.
The social and patient‐centred aspects of RD‐Connect’s work are also of key importance to the whole consortium and work has been carried out not only to ensure that patient advocates are kept abreast of RD‐Connect’s activities and that the patient voice is heard within RD‐Connect through the Patient Advisory Council and Patient and Ethics Council, but also to assess how best to train researchers in how to implement a patient‐centred approach in their own research activities.
Impact, outreach and extending collaborations
The keywords for RD‐Connect as a maturing system at the end of Year 3 have to be collaboration, sustainability, outreach and interoperability. Close cooperation with the partner projects NeurOmics and EURenOmics has continued, and end‐users from these projects have been instrumental in tailoring RDConnect’s systems to the needs of rare disease researchers. To this end, NeurOmics and RD‐Connect held back‐to‐back annual meetings in 2015 and several RD‐Connect leaders attended the EURenOmics annual meeting to provide updates and training. This will be repeated in 2016, while in 2017 the three projects plan to host a large joint final conference to highlight the research successes of all three projects.
In addition, the third year of the project has seen a marked increase in external collaborations, in particular with the biomedical research infrastructures (RIs) on the European Strategy Forum on Research Infrastructures (ESFRI) roadmap and with the Global Alliance for Genomics and Health. As multinational research consortia with a legal entity status in Europe and a duration that is not time‐limited, the ESFRI research infrastructures, in particular ELIXIR, BBMRI‐ERIC, ECRIN and EATRIS, are important potential mechanisms for future sustainability of RD‐Connect infrastructure elements, as well as a valuable source of expertise in areas of relevance to RD‐Connect. BBMRI‐ERIC is in the process of becoming a full partner of RD‐Connect (amendment submitted; EC decision pending) and will take on responsibility for assessing sustainability options for RD‐Connect’s biobanking resources, while the RD‐Connect coordination office at Newcastle University has taken on the leadership of the rare disease committee within BBMRI UK. A large number of RD‐Connect partners are also ELIXIR members and a number of joint activities have taken place, including an implementation study on linked data approaches for registries and biobanks and co‐hosting of “bring your own data” workshops for researchers working in this area. During this reporting period the biomedical sciences research infrastructures were successful in several EU Horizon 2020 infrastructures funding opportunities, and the resulting large‐scale projects – EXCELERATE (ELIXIR), ADOPT‐BBMRI‐ERIC (BBMRI) and CORBEL (multiple RIs) – not only include the participation of several RD‐Connect partners but also have dedicated rare disease work streams in which RD‐Connect is a lead participant. This has resulted in extensive interaction in particular in areas relating to data interoperability and identifiers and brings mutual benefit to both sides, since it is essential for the RI projects to be able to deal with end‐users bringing real data to the table, while for RD‐Connect it brings the opportunity to tap into valuable external expertise in handling large‐scale life sciences data.
Further developments include extensive cross‐talk with participants in the EU public health sphere including the Commission Expert Group on Rare Diseases, the EXPAND project for health data interoperability, and officials involved in the development of a platform for data sharing for the upcoming European Reference Networks. Aligning the research‐focused infrastructure developed by RD‐Connect as closely as possible with the infrastructure in the public health domain is of considerable added value and these discussions will be further developed in 2016 as the public health infrastructure work moves forward.
RD‐Connect continues to interact closely with the IRDiRC, with RD‐Connect coordinator Hanns Lochmüller chairing the IRDiRC Interdisciplinary Scientific Committee and a number of RD‐Connect partners contributing to all IRDiRC committees and engaging with various task forces and the IRDiRC recommended process.
Overall, at the end of its third year, RD‐Connect is increasingly recognised as an important player within the rare disease and genomics fields and is now in a position to consolidate much of the preceding work, reach a wider audience and incorporate additional data. Key points for 2016 include capitalising on the interest shown by other projects in contributing data to RD‐Connect, successful completion of additional functionalities of the platform including the Matchmaker Exchange API, further progress together with ELIXIR experts on interoperability and data linkage, and progress on incorporation of other omics data types, including metabolomics, proteomics and transcriptomics, as well as completing the interface between the sample catalogue and the genomics platform.
NeurOmics and EURenOmics