Containerized Installers for Data-centric Services using Databus Collections
Project Description:
This GSoC project aims to develop containerized installers for data-centric services utilizing Databus collections. Databus collections provide a framework for managing and sharing datasets across distributed systems, offering versioning, replication, and access control features.
One exemplary application of this project is integrating Databus collections with the Virtuoso Open-Source triple store, a widely used RDF service. This integration enables seamless deployment and loading of RDF datasets into Virtuoso instances within containerized environments.
Additionally, the project entails both designing and documenting best practices for deploying other Databus-driven services, along with implementing more deployment-ready containers. These containers will encapsulate the necessary components for pulling data from Databus collections and installing them with associated services, ensuring ease of deployment and scalability.
Furthermore, the project may explore integration options with the Databus frontend or even metadata, enhancing discoverability and interoperability of the deployed services within the Databus ecosystem.
Key Objectives:
-
Integrate Databus collections with the Virtuoso Open-Source Triple Store as a first use case. This can be done by building upon the Virtuoso Quickstarter repository (GitHub - dbpedia/virtuoso-sparql-endpoint-quickstart: creates a docker image with Virtuoso preloaded with the latest DBpedia dataset)
-
Design and document best practices for deploying Databus-driven services.
-
Implement 4-5 deployment-ready containers for data-centric services utilizing Databus collections. Services could, for instance, be chosen from a list of Semantic Web applications and services here: GitHub - semantalytics/awesome-semantic-web: A curated list of various semantic web and linked data resources.
-
Explore integration possibilities with the Databus frontend or metadata systems for enhanced functionality and interoperability.
Expected Outcome:
-
A well-documented Databus-driven Virtuoso Quickstarter container that focuses on ease of deployment.
-
Documentation outlining best practices and guidelines for implementing, deploying and managing Databus-driven services.
-
4-5 Containerized installers for deploying data-centric services leveraging Databus collections.
-
Design proposal for integration of these services with the Databus frontend.
-
[Optional] integration with Databus frontend or even metadata for improved discoverability and usability.
Skills Required:
-
A good understanding of SPARQL, RDF and other Semantic Web technologies
-
Some proficiency in containerization technologies (e.g., Docker, Kubernetes).
-
Knowledge of the core concepts of the DBpedia Databus (see Overview - Databus Gitbook)
-
Good documentation and communication skills
Project Size:
Estimated anywhere between 90 to 180 hours, depending on expertise and number of tackled tasks.