A Tool to Assess FAIRness (using FAIR Principles) of DBpedia DataID - GSoC2022

Description:
The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). FAIR data allows reuse of data and enables the computers to find and use data. There are several metrics to define the FAIRness of data [1] and the descriptions are given here [2]:

TO BE FINDABLE

F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.

TO BE ACCESSIBLE:

A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.

TO BE INTEROPERABLE:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.

TO BE RE-USABLE:

R1. meta(data) have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.

Goals:

  • creating and implementing FAIR data metrics in your own favourite language
  • assess DataID and produce assessment scores with standard Linked Data tools

Impact:

DBpedia publishes DataID metadata describing datasets when producing data on the fly. DBpedia DataID’s cover some of the FAIR principles, however, most of the time it fails due to the generic structure of the assessment tools or just simply DataID does not have the identifiers. So the project will increase the FAIRness of the DBpedia

Mentors

Beyza Yaman

Project size (175h, 350h)

175

Warm-up tasks:

[1] https://www.force11.org/group/fairgroup/fairprinciples
[2] https://github.com/FAIRMetrics/Metrics/blob/master/MaturityIndicators/Gen1/ALL.pdf

Unless one takes the general view that none of this data is legally encumbered — either through copyright in collections, 96/9/EC database protection (EEA and UK), or some other related legislation — then public license compatibility must also be considered. Here is my recent stab at public data license interoperability:

Additional background can be found here:

  • Morrison, Robbie (6 February 2022). Which open data license? — Release 06. doi:10.5281/zenodo.5987672. 14 pages.

Although the developers of the FAIR principles did not grasp these particular legal nettles, that does not mean they should continued to be ignored by those who follow.

Moreover efforts to get suitable open licenses on information collected by public organizations like the International Energy Agency, the World Bank Group, the United Nations, sector regulators in Europe, and so on needs to continue. HTH R.

Hello @beyza ,

My name is Rishabh Chandra. Currently working as full-time software engineer in Qualcomm, HYD India.I have completed my Master’s in Computer Science in 2020 from IIIT Hyderbad, India.

After going through Warm-up tasks according my initial impression this would involve Information Retrieval and Extraction and also statistics. I have done hands-on projects one the same while doing my master’s.
I have keen interest in studying large networks/Linked Data and was hoping to contribute to DBpedia.

Looking forward to get connected for better understanding of project.

Hi Robbie,

Thanks for the detailed explanation. FAIR Principles are aiming at better structuring data and stewardship for scientific data management. At the moment I don’t think there is a set of norms on how to do that but I guess we could say that there are some recommendations.

I guess your work could be a good option to represent the principles which are focusing on licensing issues. If this project is accepted then we could discuss this with students and mentors (or you if you have time).

Thanks for the recommendation.

Hi Rishabh. Thanks for your interest in this project.

Did you work with Linked Data before?

If you have specific questions related to this project you can ask them to me on private messages or on the forum. If you are still interested after this then you can start to write a project proposal for your ideas afterwards.

@beyza Am more than happy too contribute to further discussions on data licensing. On that note I am currently developing a short presentation for a webinar on 1 March 2022 in the context of data requirements for life cycle analysis. My slide deck is titled “On the legal reusability of public data in Europe”. More on that event here:

That sounds great Robbie. Thanks for the info!

1 Like