1.1.1.How does data sharing and analysis work in GPAP?
Edit
1.1.2.Who uses RD-Connect GPAP?
Edit
1.1.3.Are there any charges?
Edit
1.1.4.Can RD-Connect sequence my samples?
Edit
1.1.6.Does the GPAP contain any animal data?
Edit
1.1.7.I am a basic scientist and don’t have any human data to share. Can I still register?
Edit
1.2.1.How can I register with the RD-Connect GPAP?
Edit
The GPAP is available free of charge to researchers and clinicians validated in the process established by our ethical-legal experts. In the case of research groups, the PI needs to be verified first and then he/she can enrol his/her other group members, provided the PI has the authority to “vouch for” the users in his/her group. User registration and validation involves three steps. First, you need to register at [url]. We’ll ask you to provide proof of your credentials as legitimate researchers, a scan of your ID (e.g. passport), and explain your research interests, so we can verify that you are a researcher or clinician. You will also need to read and accept the RD-Connect Code of Conduct by signing an Adherence Agreement (example here). Your application will be considered by the Data Access Committee (DAC) and you will receive a response within 10 working days.
Second, you will need to provide the phenotypic information about each of your patients (min 5 Human Phenotype Ontology terms per patient) in the RD-Connect PhenoTips instance. This step is necessary before you can upload the genomic data of the patients in question.
Third, you will transfer a copy of your raw sequencing data. Once they are securely stored and processed through the RD-Connect pipeline, you will be able to analyse them in the GPAP. Validated users are granted access for 12 months and their use of the system is monitored to ensure that it is in line with their stated research interests.
1.2.2.What does signing up to the Code of Conduct really mean?
Edit
1.2.3.What do I need to do before I upload data to RD-Connect?
Edit
1.2.4.What kind of data do you accept in the GPAP?
Edit
We accept human genomic, transcriptomic and phenotypic data meeting the following requirements:
- Each file or dataset must be about an individual person, not an aggregate dataset that contains data from multiple individuals.
- We only accept datasets on individuals with a rare disease or their family members (affected or unaffected).
- We accept both solved and unsolved cases.
- We accept whole-exome, whole-genome and gene panel sequencing data.
- You must be able to send us the “raw data” from the sequencing experiment (FASTQ or untrimmed BAM format).
- Each genomic or transcriptomic dataset must be paired with detailed clinical information (information about the patient’s phenotype). We ask you to enter this in PhenoTips, an online interface that allows it to be converted into Human Phenotype Ontology (HPO) terms. See our video tutorial and PhenoTips guide.
1.2.5.What are the options for uploading genomic and phenotypic data?
Edit
One dataset is considered to include both phenotypic data in the form of human phenotype ontology (HPO) terms and exome/genome data of a patient.
There are two options for uploading datasets.
- Data collections from centres that will upload fewer than 100 datasets: datasets will be uploaded using the standard RD-Connect GPAP upload interface, which includes user friendly PhenoTips templates to enter the phenotypic information, and user-friendly tables to upload the genomic data and metadata. An option to bulk upload the genomic data and metadata is also available.
- Data collection from centres that will upload 100 or more datasets: please discuss with CNAG-CRG in Barcelona ([email protected]) a customised bulk upload option to facilitate uploading large data collections.
All sequencing data is submitted as raw data in FASTQ (or BAM) format and is processed through the standardised pipeline. For more information, see the section “Data processing and storage” below.
All phenotypic data are submitted as HPO terms. Diseases are classified using OMIM/ORDO. Both phenotype and disease classification are entered in the RD-Connect PhenoTips instance, which makes it simple for clinicians to code their data in this manner. Clinical interpretation of the data (final or temporary) can be entered in the GPAP but this is not a requirement at upload. See how to upload data in PhenoTips >>
1.2.6.How is pedigree information collected?
Edit
1.3.1.What happens to the genomic data I upload?
Edit
Once you upload your data to GPAP, the following steps happen:
1. Processing. Data is put through a standard analysis pipeline involving alignment, variant calling and annotation, resulting in a .bam file containing all read data aligned to the reference genome, and a .gvcf file containing the called variants, the positions called as reference, and annotations from the pipeline.
2. Embargoed availability (availability to the submitter’s group alone). As soon as we have processed the data you have submitted, we will send you an email to let you know it is ready for you to view in the GPAP. When the processing is complete, two things happen:
- The original submitted raw data file (.fastq or .bam) and the .bam file produced from the RD-Connect standard analysis pipeline are submitted to the European Genome-phenome Archive (EGA), for long-term storage on behalf of the original submitter and under the submitter’s ownership.
- The processed data is made available to the submitter through the GPAP for analysis by the submitter’s group. Ownership is assigned to the group of the submitting PI. If you have requested an embargo period for the data you submitted, then at this point, datasets at EGA are “invisible” in the EGA system and not requestable by external users, and datasets in the GPAP are only available to members of your group. This is now the time during which your group can access the data and do your primary analysis.
3. Controlled-access availability. After the expiry of the embargo period, two things happen:
- The datasets submitted to the EGA become “visible” in the EGA catalogue, allowing EGA users to know they exist. If an EGA user wishes to access your dataset, they must request access via the standard EGA request mechanism, meaning the data access committee you nominated will approve or deny every request.
- The corresponding datasets in the GPAP are now accessible to other authorised users for queries. During this time, you can see which other users have queried your data. You may receive requests for collaborations if other researchers find a variant or phenotype of interest in your data, in the same way that you may already have contacted other users if you found something of interest in their data when comparing it with your own.
1.3.2.How are data processed?
Edit
1.3.3.Where is genomic data stored?
Edit
The raw genomic data are stored for long-term access at the European Genome-phenome Archive (EGA), a secure, controlled-access repository. The EGA serves as an archive for publications as well as data on several levels from case control, population, and family studies. This includes raw data, which allows future reanalysis using other algorithms and genotype calls – information about pathogenic genetic variants, such as single nucleotide variants (SNVs) and copy number variants (CNVs) provided by the data submitters.
The EGA provides the necessary security required to control access to the data and maintain patient confidentiality. Data can be accessed only by authorised researchers and clinicians. In all cases, data access decisions are made not by the EGA but by an appropriate Data Access Committee, which can be the person or group submitting the data.
At the time of uploading a dataset to the GPAP, the user can indicate if the dataset is already available at the EGA and provide the corresponding reference number. For datasets not yet available at the EGA, the CNAG-CRG will broker the submission to the EGA of the data and metadata uploaded to the GPAP. The original data submitter will be responsible for making decisions regarding the future access to their datasets.
The processed data (the called variants, not the raw data) is securely stored on the GPAP servers at CNAG-CRG in Barcelona.
1.3.4.Can the GPAP data be reused in different systems?
Edit
1.4.1.Who analyses the data that I submit to GPAP?
Edit
Short answer: as the submitter, you do!
Longer answer: The primary goal of GPAP is to give you, as a rare disease researcher, the tools to better analyse your own patients: control over the primary data analysis remains with you. In addition, the GPAP enables the data you submit to be shared with other researchers who have access to the system. If you wish, you can request a 6-month embargo period, when you and your team have exclusive access to the data. After this period, it becomes accessible to other users. This allows you to find confirmatory cases or second families for your interesting candidate genes, but it also allows groups who may have completely different research questions to benefit from the availability of your data.
From time to time, the GPAP development team in Barcelona may run some analyses on your data to test the tools that we incorporate into the system and in some cases to provide you with interesting results or candidates that you might wish to follow up yourselves. For example, with our runs of homozygosity tool, we were able to point out to some submitters that some of their cases were consanguineous when this was not known at the time of submission – this allowed some interesting candidates to be found in homozygous regions. In cases where we do this, we provide the interesting results to you as the submitter so that you can follow them up – we would not publish gene discovery or case report papers on your patients ourselves. We do have an interest in publishing aggregate data and in methodology and statistical papers that show the value of the system. In these cases, we credit all data submitters with an acknowledgement.
Clinical interpretation of the data (final or temporary) can be entered in the GPAP but this is not a requirement at upload.
1.4.2.What analyses can the GPAP do?
Edit
A number of inbuilt functionalities allow identifying disease-causing genetic variants in patients and solving even difficult cases.
Basic filters Users can analyse one or multiple individuals, e.g. patient and family members. The results can be filtered by quality, mode of inheritance (e.g. autosomal, homozygous recessive, compound heterozygous, etc.), control population frequencies (e.g. 1000 Genomes Project, gnomAD) and known and expected pathogenicity according to ClinVar database and several predictors.
Runs of Homozygosity (RoH) allow identifying consanguineous cases even when not identified as such by the clinician. RoH narrows the search down to the homozygous regions in the patient’s genome, which are more likely to be contain the disease-causing variant.
Phenotypic data The Platform matches individual’s sequencing data with detailed clinical information about his/her symptoms (deep phenotyping), which are stored in the Human Phenotype Ontology (HPO) format in the PhenoTips database. The Platform allows refining the genomic analysis results by selected HPO-encoded symptoms, or genes related to a specific disease in the OMIM (Online Mendelian Inheritance in Man) database.
Built-in Exomiser The fully-integrated Exomiser tool extracts clinical information from PhenoTips automatically and highlights the candidate variants best matching the patient’s symptoms.
Patient matchmaking allows finding individuals with variants affecting the gene of interest. Presence of similar symptoms in the matching patients is a strong hint that a given variant is disease causing and helps confirm genetic diagnosis. Matchmaking can also help basic researchers learn how mutations in the gene they study affect humans.
Connection to other databases To make gene discovery easier, all variants on the results list have direct links to variant descriptions in external databases, such as OMIM, PubMed and Ensembl.
RD-Connect bioinformatic tools In addition, RD-Connect has developed several bioinformatic tools to help researchers analyse omics data and identify targets for potential therapies. Read more >>
To learn how to perform those analyses, watch our GPAP video tutorials.
1.4.3.Matchmaking and discovery
Edit
GPAP is integrated in the Beacon Network (https://beacon-network.org), a project by the Global Alliance for Genomics and Health (GA4GH).
GPAP participates in MatchMaker Exchange (MME). Patient profiles can be pushed directly to PhenomeCentral from the GPAP PhenoTips instance by the data submitters. The dataset submitter must specifically enable it to be discoverable through MME at the time of submission or in the data management portal. Permission can also be disabled.
1.4.4.Technical aspects
Edit
For more technical FAQ, such as which version of the human genome and transcript set are used in the GPAP, please visit https://platform.rd-connect.eu/faq.html.
1.4.5.Why can’t one ID be used for all data coming from the same patient in different databases?
Edit
1.5.1.Who can access the data in GPAP and on what conditions?
Edit
1.5.2.Are there different levels of access to data?
Edit
1.5.3.Embargo periods
Edit
The embargo period is considered to start at the moment a specific complete dataset (genomic plus phenotypic data) is made accessible to the dataset submitter, which is communicated to the submitter via email. Under certain circumstances, a submitter can request a longer embargo period, but this must be requested at the time of submission and justification must be provided.
1.5.4.Can commercial companies access patient data?
Edit
1.5.5.How do you protect patient security and confidentiality?
Edit
1.5.6.Will I know if someone accessed my data?
Edit
1.5.7.What can users do with the data?
Edit
After embargo, datasets are available for searching and querying by other authorised users.
1.5.8.Can I download data from GPAP?
Edit
1.5.9.How does GPAP safeguard the data?
Edit
Data is stored in a cluster with a very restricted access policy, limited internet access and daily backups. GPAP security was audited in October 2017 with no major risks being identified. GPAP requests and user actions are safely logged for audit purposes.
1.5.10.Is GPAP compliant with the General Data Protection Regulation (GDPR)?
Edit
1.5.11.Is there an audit trail of the access requests?
Edit
All activities of each user in GPAP are logged by the system for future audit.
1.5.12.Can a patient be identified via the GPAP?
Edit
1.5.13.Can patients request to remove their personal information from the GPAP database?
Edit
1.5.14.How are GPAP users authenticated?
Edit
1.5.15.Are there secure links between GPAP and the databases connected to it?
Edit
1.5.16.What are the procedures in case of technical failure?
Edit
1.6.1.Which ethics committee has approved the RD-Connect GPAP?
Edit
1.6.2.Who owns the GPAP database and the relating intellectual property rights on it?
Edit
1.6.3.Who owns the data in GPAP?
Edit
1.6.4.Who oversees the GPAP and its use?
Edit
1.6.5.Which are the rules governing the use of the GPAP database?
Edit
1.6.6.What consent form is used for the data in GPAP?
Edit
2.1.I have a rare disease. Can I send you my genomic data?
Edit
2.2.I have a rare disease. Can you tell me if you have my data in the RD-Connect GPAP?
Edit
2.3.My doctor has told me that my data was sent to RD-Connect. Can I register to see my own data in the RD-Connect GPAP?
Edit
2.4.Is it safe for my data to be in the RD-Connect GPAP?
Edit
2.5.Can I request to remove my personal information from the GPAP database?
Edit
2.6.Do patients need to consent for their data to be stored in RD-Connect?
Edit
In all cases, your doctor should have mentioned to you that your data and your biosample (your blood or tissue sample or the DNA extracted from it) might be shared with other doctors and scientists across the world to find a diagnosis or to do research into your condition. Your doctor might not have explicitly mentioned RD-Connect to you – sometimes “broad consent” is given, which means that patients consent to sharing in general and not only for a particular project. This can be useful for example if a doctor wants to use a patient’s biosamples or data in several different projects because they didn’t find the diagnosis or the answer to the research question in the first project, or because different projects deal with different research questions about the same disease. In this case, the patient is just asked the general question about whether they agree to the different possible uses of their biosamples and data, and not to the specific project name.