Containerized Installers for Data-centric Services using Databus Collections β€” GSoC 2024

Containerized Installers for Data-centric Services using Databus Collections

Project Description:

This GSoC project aims to develop containerized installers for data-centric services utilizing Databus collections. Databus collections provide a framework for managing and sharing datasets across distributed systems, offering versioning, replication, and access control features.

One exemplary application of this project is integrating Databus collections with the Virtuoso Open-Source triple store, a widely used RDF service. This integration enables seamless deployment and loading of RDF datasets into Virtuoso instances within containerized environments.

Additionally, the project entails both designing and documenting best practices for deploying other Databus-driven services, along with implementing more deployment-ready containers. These containers will encapsulate the necessary components for pulling data from Databus collections and installing them with associated services, ensuring ease of deployment and scalability.

Furthermore, the project may explore integration options with the Databus frontend or even metadata, enhancing discoverability and interoperability of the deployed services within the Databus ecosystem.

Key Objectives:

Expected Outcome:

  • A well-documented Databus-driven Virtuoso Quickstarter container that focuses on ease of deployment.

  • Documentation outlining best practices and guidelines for implementing, deploying and managing Databus-driven services.

  • 4-5 Containerized installers for deploying data-centric services leveraging Databus collections.

  • Design proposal for integration of these services with the Databus frontend.

  • [Optional] integration with Databus frontend or even metadata for improved discoverability and usability.

Skills Required:

  • A good understanding of SPARQL, RDF and other Semantic Web technologies

  • Some proficiency in containerization technologies (e.g., Docker, Kubernetes).

  • Knowledge of the core concepts of the DBpedia Databus (see Overview - Databus Gitbook)

  • Good documentation and communication skills

Project Size:

Estimated anywhere between 90 to 180 hours, depending on expertise and number of tackled tasks.

2 Likes

One main thing here, which is pretty cool, is RDF. So people could pick any RDF dataset from the bus or assemble their dataset and then deploy RDF applications based on the dataset via Docker.

Hey Janforberg,

This is Ronit Banerjee. I was a part of GSoC 2023 at DBpedia as a Mentee under Edgard Marx’s project which dealt with Java Spring Boot, Maven, Docker and Documentation.
Your project idea in #gsoc2024 caught my sight, I would love to volunteer as a mentor at your project as I have a good understanding of the tech, how this organisation works and also the culture of open source development.

I am open to assist you with this.

2 Likes

hi @janfo
I am Surjendu Pal, an open source enthusiast, currently in final year of college. I have worked in java based technologies. I have made projects with Java, Spring Boot and Docker. This is my github (surjendu104 (Surjendu) Β· GitHub). This project suits best to me. So I want to contribute in this project. Thanks.

Sincerely
Surjendu Pal
surjendup104@gmail.com