Containerized Installers for Data-centric Services using Databus Collections β€” GSoC 2024

Containerized Installers for Data-centric Services using Databus Collections

Project Description:

This GSoC project aims to develop containerized installers for data-centric services utilizing Databus collections. Databus collections provide a framework for managing and sharing datasets across distributed systems, offering versioning, replication, and access control features.

One exemplary application of this project is integrating Databus collections with the Virtuoso Open-Source triple store, a widely used RDF service. This integration enables seamless deployment and loading of RDF datasets into Virtuoso instances within containerized environments.

Additionally, the project entails both designing and documenting best practices for deploying other Databus-driven services, along with implementing more deployment-ready containers. These containers will encapsulate the necessary components for pulling data from Databus collections and installing them with associated services, ensuring ease of deployment and scalability.

Furthermore, the project may explore integration options with the Databus frontend or even metadata, enhancing discoverability and interoperability of the deployed services within the Databus ecosystem.

Key Objectives:

Expected Outcome:

  • A well-documented Databus-driven Virtuoso Quickstarter container that focuses on ease of deployment.

  • Documentation outlining best practices and guidelines for implementing, deploying and managing Databus-driven services.

  • 4-5 Containerized installers for deploying data-centric services leveraging Databus collections.

  • Design proposal for integration of these services with the Databus frontend.

  • [Optional] integration with Databus frontend or even metadata for improved discoverability and usability.

Skills Required:

  • A good understanding of SPARQL, RDF and other Semantic Web technologies

  • Some proficiency in containerization technologies (e.g., Docker, Kubernetes).

  • Knowledge of the core concepts of the DBpedia Databus (see Overview - Databus Gitbook)

  • Good documentation and communication skills

Project Size:

Estimated anywhere between 90 to 180 hours, depending on expertise and number of tackled tasks.


One main thing here, which is pretty cool, is RDF. So people could pick any RDF dataset from the bus or assemble their dataset and then deploy RDF applications based on the dataset via Docker.

Hey Janforberg,

This is Ronit Banerjee. I was a part of GSoC 2023 at DBpedia as a Mentee under Edgard Marx’s project which dealt with Java Spring Boot, Maven, Docker and Documentation.
Your project idea in #gsoc2024 caught my sight, I would love to volunteer as a mentor at your project as I have a good understanding of the tech, how this organisation works and also the culture of open source development.

I am open to assist you with this.


hi @janfo
I am Surjendu Pal, an open source enthusiast, currently in final year of college. I have worked in java based technologies. I have made projects with Java, Spring Boot and Docker. This is my github (surjendu104 (Surjendu) Β· GitHub). This project suits best to me. So I want to contribute in this project. Thanks.

Surjendu Pal

@ronitblenz I would also join as a mentor on this but it would be great if you could co-mentor this project with me!

@surjendu104 sounds great, thank you for your application. I am currently unsure how exactly the projects are going to be assigned, but I’ll get back to you.

Sorry for the long silence in here!

1 Like

Awesome! I am in.

Dear @contributors for #gsoc2024

If you need to get started with Semantic Web, you can check out my documentation which I prepared last year while I was a contributor.

Hope this Helps! All the Best!

Hi @janfo and @ronitblenz , I reviewed the project doc and the project sounds interesting, I have some ideas to get started and would be grateful to know the further process.

Chirag Tyagi

1 Like