Ontology Time Machine / Package Manager using DBpedia Archivo — GSoC 2024

jfrey · February 1, 2024, 1:54am

FAIR’Ar - FAIRer Ontologies with DBpedia Archivo

Introduction

DBpedia Archivo is one of the biggest and most recent ontology archives (check out the paper)
While developing data-driven apps with ontologies has many advantages (e.g. you can re-use work like schema definition of other people to make your apps integrate easily with existing data) this interlinked web ontologies (usually one ontology has several (transitive) dependencies) is out of your control - an ontology can change (evolution) or become unavailable at any time. DBpedia Archivo comes to rescue since it provides versioned snapshot of over 1800 ontologies. But at the moment there is missing a dependency manager like pip for PyPI and Maven for MavenCen
tral

Impact

allow building reliable semantic apps, fault-tolerant workflows, reproducible experiments utilizing DBpedia Archivo in combination with a flexible ontology provider that serves as time machine proxy, ontology package installer, and ontology dependency management system to overcome ontology evolution and availability issues.

Goal

Let’s have FAIR’Ar Ontologies for Linked Data and RDF Knowledge Graphs. By using DBpedia Archivo we want to improve the FAIRness of ontologies by using 3 major strategies:

Creation of a time machine proxy that makes it possible to access a specific point in time and the corresponding versions of ontologies hosted on Archivo, even if they are no longer available at the original source
Evaluation and representation of the FAIRness of an ontology and its dependencies so that developers know whether it is a good choice to adopt a particular ontology
Extension and improvement of Archivo to support and implement the above points

Steps

realize ontology wayback machine (transparent) proxy (optionally simulating a memento TimeGate)
- time-based mode: serve versions archived for a certain point in time
- dependent-lock based mode: serve specific versions based on a local manifest or manifest in (transitively) included ontologies
- failover mode: redirect to the latest archived version in the event that an ontology is not available anymore
realize dependency analysis for every ontology version (owl:imports and referenced ontologies) and
assess the availability and the FAIRness score for ontology versions
- FAIR assessment tool integration
- availability score based on DBpedia Archivo availability monitoring
- aggregated score based on (transitive) dependency scores
extend ontology version numbers with additional overlays (e.g. owl:versionIRI)
realize dependency “package” manager and lockfile option
track ontology usage via the proxy (such that ontologies not added yet can be suggest for inclusion in Archivo

Skills/Technology : Python (required), RDF, OWL , HTTP,

Warm-up tasks

easy

Download a sample of the latest Archivo ontologies (as ntriples files using this Databus Collection https://databus.dbpedia.org/denis/collections/latest_ontologies_as_nt_sample/ ) and load them into a Virtuoso (see Archivo - Ontology Access for instructions). Write a SPARQL query to count all classes that are defined in the dowloaded ontologies. Write a grep or awk command to count all classes directly on the downloaded files and check if there is a difference. See here for pitfalls Archivo Ontolysense - GSoC 2023

medium

Setup your own instance of Archivo; create a fork of the Github project and extend the insufficient dependency tree Pull Request #29 task prototypical into the ontology version view of Archivo as a separate dependency “tab”

Project size

350 hours

Mentors

Johannes Frey, Dr. Natanael Arndt

thedonutcat · February 23, 2024, 2:37am

Hey, I wanted to look into this idea, how do I get started? I have had prior experience working with dbpedia even though it was only for a bit but wanted to know more information

jfrey · February 23, 2024, 9:59pm

warmup tasks you can see this thread (the easy task) Archivo Ontolysense - GSoC 2023 and also at the end of the thread

a medium warmup task is to setup your own instance of Archivo, create a fork of the Github project and extend the insufficient PR Created dependency tree by GOVINDFROMINDIA · Pull Request #29 · dbpedia/archivo · GitHub task prototypical into the ontology version view of Archivo as a separate dependency “tab”

Johnathan · March 3, 2024, 1:55pm

Will this project be allocated a slot ?

jfrey · March 11, 2024, 7:27am

Every project will have the opportunity to get a slot. For more details see How will the projects be allocated given there will be limited slots.

But of course the quality of the proposal can have effects on the ranking. Especially if the proposal is low quality and AI-generated content it is likely that we do not “endorse” it.