Ontology Time Machine / Package Manager using DBpedia Archivo — GSoC 2024

FAIR’Ar - FAIRer Ontologies with DBpedia Archivo

Introduction

DBpedia Archivo is one of the biggest and most recent ontology archives (check out the paper)
While developing data-driven apps with ontologies has many advantages (e.g. you can re-use work like schema definition of other people to make your apps integrate easily with existing data) this interlinked web ontologies (usually one ontology has several (transitive) dependencies) is out of your control - an ontology can change (evolution) or become unavailable at any time. DBpedia Archivo comes to rescue since it provides versioned snapshot of over 1800 ontologies. But at the moment there is missing a dependency manager like pip for PyPI and Maven for MavenCen
tral

Impact

allow building reliable semantic apps, fault-tolerant workflows, reproducible experiments utilizing DBpedia Archivo in combination with a flexible ontology provider that serves as time machine proxy, ontology package installer, and ontology dependency management system to overcome ontology evolution and availability issues.

Goal

Let’s have FAIR’Ar Ontologies for Linked Data and RDF Knowledge Graphs. By using DBpedia Archivo we want to improve the FAIRness of ontologies by using 3 major strategies:

  • Creation of a time machine proxy that makes it possible to access a specific point in time and the corresponding versions of ontologies hosted on Archivo, even if they are no longer available at the original source
  • Evaluation and representation of the FAIRness of an ontology and its dependencies so that developers know whether it is a good choice to adopt a particular ontology
  • Extension and improvement of Archivo to support and implement the above points

Steps

  • realize ontology wayback machine (transparent) proxy (optionally simulating a memento TimeGate)
    • time-based mode: serve versions archived for a certain point in time
    • dependent-lock based mode: serve specific versions based on a local manifest or manifest in (transitively) included ontologies
    • failover mode: redirect to the latest archived version in the event that an ontology is not available anymore
  • realize dependency analysis for every ontology version (owl:imports and referenced ontologies) and
  • assess the availability and the FAIRness score for ontology versions
    • FAIR assessment tool integration
    • availability score based on DBpedia Archivo availability monitoring
    • aggregated score based on (transitive) dependency scores
  • extend ontology version numbers with additional overlays (e.g. owl:versionIRI)
  • realize dependency “package” manager and lockfile option
  • track ontology usage via the proxy (such that ontologies not added yet can be suggest for inclusion in Archivo

Skills/Technology : Python (required), RDF, OWL , HTTP,

Warm-up tasks

easy

medium

  • Setup your own instance of Archivo; create a fork of the Github project and extend the insufficient dependency tree Pull Request #29 task prototypical into the ontology version view of Archivo as a separate dependency “tab”

Project size

  • 350 hours

Mentors

Johannes Frey, Dr. Natanael Arndt

1 Like

Hey, I wanted to look into this idea, how do I get started? I have had prior experience working with dbpedia even though it was only for a bit but wanted to know more information

warmup tasks you can see this thread (the easy task) Archivo Ontolysense - GSoC 2023 and also at the end of the thread

1 Like

Will this project be allocated a slot ?

Every project will have the opportunity to get a slot. For more details see How will the projects be allocated given there will be limited slots.

But of course the quality of the proposal can have effects on the ranking. Especially if the proposal is low quality and AI-generated content it is likely that we do not “endorse” it.