FAIR’Ar - FAIRer Ontologies with DBpedia Archivo
Introduction
DBpedia Archivo is one of the biggest and most recent ontology archives (check out the paper)
While developing data-driven apps with ontologies has many advantages (e.g. you can re-use work like schema definition of other people to make your apps integrate easily with existing data) this interlinked web ontologies (usually one ontology has several (transitive) dependencies) is out of your control - an ontology can change (evolution) or become unavailable at any time. DBpedia Archivo comes to rescue since it provides versioned snapshot of over 1800 ontologies. But at the moment there is missing a dependency manager like pip for PyPI and Maven for MavenCen
tral
Impact
allow building reliable semantic apps, fault-tolerant workflows, reproducible experiments utilizing DBpedia Archivo in combination with a flexible ontology provider that serves as time machine proxy, ontology package installer, and ontology dependency management system to overcome ontology evolution and availability issues.
Goal
Let’s have FAIR’Ar Ontologies for Linked Data and RDF Knowledge Graphs. By using DBpedia Archivo we want to improve the FAIRness of ontologies by using 3 major strategies:
- Creation of a time machine proxy that makes it possible to access a specific point in time and the corresponding versions of ontologies hosted on Archivo, even if they are no longer available at the original source
- Evaluation and representation of the FAIRness of an ontology and its dependencies so that developers know whether it is a good choice to adopt a particular ontology
- Extension and improvement of Archivo to support and implement the above points
Steps
- realize ontology wayback machine (transparent) proxy (optionally simulating a memento TimeGate)
- time-based mode: serve versions archived for a certain point in time
- dependent-lock based mode: serve specific versions based on a local manifest or manifest in (transitively) included ontologies
- failover mode: redirect to the latest archived version in the event that an ontology is not available anymore
- realize dependency analysis for every ontology version (owl:imports and referenced ontologies) and
- assess the availability and the FAIRness score for ontology versions
- FAIR assessment tool integration
- availability score based on DBpedia Archivo availability monitoring
- aggregated score based on (transitive) dependency scores
- extend ontology version numbers with additional overlays (e.g. owl:versionIRI)
- realize dependency “package” manager and lockfile option
- track ontology usage via the proxy (such that ontologies not added yet can be suggest for inclusion in Archivo
Skills/Technology : Python (required), RDF, OWL , HTTP,
Warm-up tasks
easy
- Download a sample of the latest Archivo ontologies (as ntriples files using this Databus Collection https://databus.dbpedia.org/denis/collections/latest_ontologies_as_nt_sample/ ) and load them into a Virtuoso (see Archivo - Ontology Access for instructions). Write a SPARQL query to count all classes that are defined in the dowloaded ontologies. Write a grep or awk command to count all classes directly on the downloaded files and check if there is a difference. See here for pitfalls Archivo Ontolysense - GSoC 2023
medium
- Setup your own instance of Archivo; create a fork of the Github project and extend the insufficient dependency tree Pull Request #29 task prototypical into the ontology version view of Archivo as a separate dependency “tab”
Project size
- 350 hours
Mentors
Johannes Frey, Dr. Natanael Arndt