Bringing together LLMs and RDF Knowledge Graphs — GSoC 2024

Goal:

Improve the capabilities of Large Language Models (LLMs like Gemini or ChatGPT) in interfacing RDF Data and RDF Knowledge Graphs. In order to do so as a major step, the aim is to build services that allow LLMs to lookup the concept in ontologies using DBpedia Archivo as flexible source for ontologies.

Tasks:

  • improve an existing term search API based on DBpedia Lookup and DBpedia Archivo, such that it can be integrated via langchain/chatgpt/claude plugins to search for ontologies, classes, properties
  • load the ontologies into a vector database such that LLMs can find relevant information in vector space
  • integrate popularity information as mean to rank candidates (LOD stats, void stats generate e.g. from SPARQL endpoint)
  • test and measure and compare the quality/performance (improvement) with the llm-kg-bench framework based by extending and adding evaluation test cases (converting a factsheet or csv into a KG, writing meaningful queries towards a SPARQL endpoint, mapping ontologies or datasets)

Project size

  • 175 hours or 350 hours

Mentors

Johannes Frey, Dr. Mahdi Hedayat Mahmoudi, Hannes Hartmann

2 Likes

Hello,

I’m highly interested in the GSOC 2024 project focusing on Semantic Web and NLP for DBpedia. With recent research study in NLP and hands-on experience, I’m eager to contribute for bringing together LLMs and RDF Knowledge Graphs. Excited about the opportunity to collaborate and make a meaningful impact.

Best Regards
Tsirindanis Chrysovalantis

1 Like

Hey, DBpedia Community
Kshitij this side, currently I am a sophomore studying at VJTI, Mumbai in the Second Year Computer Science branch . I recently viewed the above project about integrating LLMs and RDFs which motivated me to collaborate with you for it.

I have prior experience of making a Transformer model from scratch and training it using numpy and cupy libraries and also good proficiency in C++ as well as Python . Had a past experience of working with Gemini API for captions and blogs generation.

I understand what the project expects from me but still can you help me what to start with next so as to have a better grip on it.

Maybe as a starting point you could read through this here HowTo -- Linked Data Reasoning & Inference Exercise using CHAT-GPT (Version 4 Engine) - LD How-Tos and Tutorials - OpenLink Software Community and perform the easy task in
Archivo Ontolysense - GSoC 2023

Hi all! I am a serial entrepreneur and we have launched an Multi-Agent creation platform company together with Ratmir Timashev, billionaire founder of Veeam software. We are looking for experts in dbpedia and similar ontologies to come work with or for us. Longer version:

I have been strongly convinced for some time that LLMs (and ANNs in general) will at best allow to emulate Kahneman’s “System 1” - quick, pattern based subconcious thinking, but they lack the ability to represent the world knowledge and reason about it - what’s needed for System 2.

I am extremely happy to have found this initiative as combining LLMs and ontologies / reasoning engines is probably much faster and more efficient way to create really helpful, AGI-like assistants. We would be more than happy to colaborate with any individuals or teams working in the similar direction - be it as joint projects, grants or potentially full-time employment. If you are experienced in dbpedia - please drop me a note and let’s change the world together :slight_smile:

  • Best wishes,
    Anton Antich

Greetings @jfrey and DPpedia community

I am a junior year undergrad from Indian Institute of Science Education and Research,Bhopal currently enrolled under research program for Economics and Data Science.

I have hands on experience with vector databases like pinecone and have also used FASS to build custom fine tuned mixtral based chatbot using RAG which can asnwer anything related to our college.You can checkout the it here, IISERB-GPT.

Also I am expanding my knowledge via collaborating with Largo Labs,IISER Kolkata,one of the handful research lab in India for Large Language Model Research to research on efficieny of application of LLMs on mental health therapy via chat and using NLP via semantic keyword mapping of text from chats with patients to detect mental health disorder.

I feel elated to contribute to DBpedia especially this project.

Regards
Dhriman

1 Like

Hello DBpedia community,

I am a 2nd year Computer Science MsC student at Faculdade de Ciências da Universidade de Lisboa in Portugal.
I will be applying as a contributor for GSoC 2024 and I am highly interested in joining this project.

Regarding my experience on these subjects: my thesis is focused on using LLMs to improve viral vector development. I am also familiar with knowledge graphs and related technologies as I’m part of LISEDA, a semantic data research lab fundamentally aligned with knowledge graph technologies and have been following other members’ projects for the last 7 months.
While working at LISEDA, I also contributed to the development of Matcha-DL, an augmentation for Matcha (an ontology matching system) that uses a dense neural network to rank candidates based on candidate input score. Additionally, in the following weeks I will be researching more about knowledge graphs to become further acquainted with the subject.

I am looking forward for a chance to contribute to this project.

Best wishes,
Lucas Ferraz

1 Like

Hi DBpedia Community,

I am a PhD candidate at Vrije Universiteit Amsterdam and my research interests on RDF knowledge graph construction and application.

I am currently researching on leveraging LLMs to create a fine-grained scholarly knowledge graph from scientific papers. I have experiences with vector databases such as Chroma, an open-source vector database and RAG using domain-specific knowledge. Previous to my current research topic, I worked on scholarly paper recommendation which familiarized me with information retrieval and similarity measurement for ranking.

I am very interested in this project which could harness the ability of LLMs and RDF KGs to improve this search service in DBpedia.

Kind regards,
Xueli

1 Like

Hi DBpedia Community and Mentors,

Greetings…

This is Srepadmashiny. I’m a 2nd-year student at the Indian Institute of Technology Guwahati.

I’m a Graph Data Science and LLM enthusiast.

I have prior experience in creating a neural network-based Triplet Ripper from relational unstructured databases and ingesting it into a Knowledge graph along with Sentence embedding and Graph-based embedding.

I have good proficiency in C, C++, and Python. I’m a project member and coordinator at the Coding Club at IIT Guwahati, where I work on ML projects.

I have started working on the warm-up tasks given. I’m very excited and interested in collaborating with you all on this.

1 Like

Hey

This is Soham, a 2nd year Master student in AI at the University of Amsterdam.
I have prior experience as an Applied Scientist at Amazon Search in ranking models, fientuning LLMs etc.

I would love to contribute in this project.
LinkedIn: Soham Chatterjee - Amsterdam, North Holland, Netherlands | Professional Profile | LinkedIn

1 Like

@jfrey Sir , I have sent you an email kindly check

Hi sir, my email id is padmashinyk@gmail.com and I have emailed to this email address of yours frey@informatik.uni-leipzig.de.