Second RD-Connect jamboree focuses on variant calling and analysis
From 30 June to 2 July 2014 Christophe Béroud’s team from Aix-Marseille University Medical School hosted 30 participants from Europe, the USA and Australia for the second RD-Connect data analysis jamboree. This year’s event focused on whole exome sequencing variant analysis, in particular the development of a standard variant calling pipeline and the methods for analysis and prioritisation of variants towards finding causative mutations.
After a welcome by Christophe Béroud as host and an update on the International Rare Disease Research Consortium (IRDiRC) by Hanns Lochmüller (Newcastle), David Salgado (Marseille) and Sergi Beltran (Barcelona) presented the outcomes of a benchmarking process using NA12878, a well-known reference sample, which led to the implementation of the first version of the RD-Connect standard analysis pipeline. As a result of the benchmarking of systems in place in Leiden, Groningen, Marseille and Barcelona, the stan
As part of the testing of the pipeline, RD-Connect made use of pilot data provided by the Neuromics project. The aligned BAM files from Neuromics were uploaded to the European Genome-phenome Archive (EGA), where they will be stored in perpetuity and will soon become available to any researcher wishing to reuse the Neuromics data. The files were transferred to RD-Connect (CNAG, Barcelona), where where they were converted to raw data FASTQ files and aligned against the reference genome with the first version of the standard analysis pipeline. The next step in the pipeline was to call variants and annotate them with publicly available tools and those newly developed in RD-Connect (Marseille). The results of this recalling procedure provided an opportunity for comparison against the original Neuromics calls.
Victor de la Torre (Madrid) presented the central database for the reprocessed data and the current status of the user interface for the RD-Connect platform, and Mats Hansson (Uppsala) discussed ethical considerations for data sharing and the principles that should be incorporated into data sharing agreements.
The main focus of the jamboree then moved on to the methods for variant prioritisation and analysis, refining the data down from the hundreds of thousands of variants found in a single exome towards the single variant that is the cause of a monogenic disorder. The RD-Connect platform will provide a user-friendly interface for researchers to perform this type of analysis, enabling filtering and prioritisation by different methods and plugging in of various analysis tools, including commonly used tools and those being developed within RD-Connect. During the jamboree, 20 use cases from the Neuromics project were provided to participants for analysis. The cases included sibling pairs, families with several affected individuals, and trios. Some of these cases had already been solved by NeurOmics investigators, while others are still unsolved. Using the data from the sequencing plus phenotypic information, participants were expected to show the methods they would use to solve the cases, and this allowed end-users from NeurOmics and EURenOmics to clearly describe to the developers the features that the platform interface should have in order to facilitate their work. Further work in autumn 2014 will then enable these features to be implemented into the user interface, while a command line interface will also allow advanced users to perform their own queries. The first version of the user interface is planned to be ready by the time of the next RD-Connect annual meeting in March 2014.
The jamboree also provided an opportunity for short update on associated bioinformatics tools: Christophe Béroud presented pathogenicity prediction results from UMD-Predictor and a prototype variant filtration tool developed in Marseille, while Matt Bellgard (Perth WA) presented the Yabi workflow environment and the second generation of their Rare Disease Registry Framework, and Andreas Zankl (Sydney) presented the BioLarK Archive and Skeletome Knowledge Base. Mark Thompson (Leiden) presented work done on integration of datasets from multiple sources using the COEUS tool to create semantic knowledgebases.
Finally, to update participants on progress towards the broader RD-Connect goals, Lucia Monaco (Milan) and Roxana Merino (Stockholm) presented the outcomes of the RD-Connect biobanking workpackage and the development of both a system for registering biobanks and registries and a sample catalogue using the MOLGENIS system. The sample catalogue will be integrated with the RD-Connect platform in order to allow individual omics datasets to be linked back to source samples for further research.
The jamboree organisers would like to thank all participants, in particular those from our collaborating projects Neuromics and EURenOmics, for their constructive feedback and collaborative spirit.