FAIRification of rare disease registries

Create interoperable data I

To make data interoperable i.e., machine-readable, we first define a driving user question, e.g., “What effect do different treatments have on the disease severity?”. A driving user question should be defined by the domain expert together with a FAIR data expert. This is to make sure that it reflects both the research interest of the domain expert, but also that it covers the full potentials of Linked Data provided by the FAIR data expert.

Create interoperable data II

The question is analysed and the specific data elements and items required to answer the question(s) are identified. Here, this could be: treatment(s) and symptoms/phenotypes that can be linked to disease severity for each patient. Unique persistent identifiers are then assigned to the data. At this stage it’s important to think about what the relations are between the concepts.

Create interoperable data III

These will be used to create (or chose) a semantic data model that will be used to ‘guide’ the creation of machine-readable data.

Create interoperable data IV

The creation of machine-readable data can be done using the DTL FAIRifier (based on OpenRefine and its RDF plug-in) or the data integration services in MOLGENIS.

Create interoperable metadata with clearly defined accessibility and clear description to enable reusability

When we FAIRify rare disease registries we often have no or little metadata to begin with. Therefore metadata needs to be created from scratch. This requires work from e.g., the data owner to describe how the data was created (reusability), a in-house legal expert to define the accessibility, and a FAIR expert to make the metadata interoperable i.e., machine-readable. The creation of machine-readable metadata can be done using the DTL Metadata Editor. We use the Data Catalogue (DCAT) vocabulary to describe metadata, and are currently defining a standard for rare disease registries that defines a way for registries to share and reuse metadata models and address crucial concerns about data access permission and data security.

FAIR data point (FDP) I

Metadata is made findable for machines through FDPs and describe under which conditions the data can be accessed and how it can be i.e., reproduced. A FDP needs to be indexed in a search engine in order to be findable. This can e.g., be Google, however, at the Dutch Techcentre for Life Sciences (DTL) efforts are underway to make a ‘FAIR search engine’. The FAIR data point RESTful API uses the DCAT vocabulary and Datacite’s Registry of Research Data Repositories (RE3Data) to provide high level metadata descriptors about data deposits, and to provide instructions to access various distributions of data sets.

FAIR data point (FDP) II

Finally, an answer can be found to the driving user question defined in Step 1 by creating sparql queries.