Agentic Question Answering over DBpedia — GSoC 2026

This project started in 2018 as ‘A Neural QA Model for DBpedia’ and is now looking to its 7th year at Google Summer of Code after a hiatus of three years.

Introduction

Neural SPARQL Machines (NSpM) pioneered end-to-end approaches to answer questions posed by users not versed with writing SPARQL queries. This project takes that vision further with an agentic architecture.

Currently, billions of relationships on the Web are expressed in the RDF format. Accessing such data is difficult for a lay user, who does not know how to write a SPARQL query. This GSoC project consists of building an agentic question answering system over DBpedia, where an LLM-based agent can autonomously plan and execute queries by leveraging a set of tools — including entity linking indexes, ontology indexes, the DBpedia SPARQL endpoint, and other retrieval mechanisms — to answer natural-language questions (as of now restricted to English).

Documentation

Related work

The first 3 papers introduce and elaborate on Neural SPARQL Machines. Work number 3 was carried out by our GSoC 2019 student and published at KGSWC 2020. The 4th paper is an almost-complete survey of related approaches.

  1. SPARQL as a Foreign Language
  2. Neural Machine Translation for Query Construction and Composition
  3. Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
  4. Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs

GSoC Blogs

You may also check which problems past GSoC contributors worked on:

  1. [GSoC 2018] Aman’s Blog — building raw templates
  2. [GSoC 2019] Anand’s Blog — automating template creation
  3. [GSoC 2020] Zheyuan’s Blog — paraphrasing questions
  4. [GSoC 2021] Siddhant’s Blog — data augmentation
  5. [GSoC 2022] Saurav’s Blog — refining template discovery
  6. [GSoC 2023] Mehrzad’s Blog — fine-tuning code LLMs

Warm-up tasks

  1. Read the Medium post What is a Neural SPARQL Machine? to get a general idea about NSpM.
  2. Read through the most recent blogs and the reading list to get a good understanding of the project. This will allow you to get a good idea about its current state.
  3. Understand the entity linking service that maps strings to lists of entities by confidence value.

Your proposal

Now that you have a good understanding of the current state of the project, we ask you to write your own proposal. The core challenge is designing an agent that can reliably answer natural-language questions over DBpedia by selecting and composing the right tools — entity linking, ontology lookup, SPARQL query construction and execution, result validation, and so on.

You are free to choose the LLM backbone, the agent framework (e.g., LangChain, LlamaIndex, custom), and the tool set. You may propose additional tools beyond those listed above, and you are encouraged to evaluate your system against existing QA benchmarks such as QALD.

Project size

The size of this project can be either medium or large. Please state in your proposal the number of total project hours you intend to dedicate to it (175 or 300).

Mentors

@tsoru, @smilingprogrammer, @ronitblenz, @gnav

Feel free to contact us for more information. We eagerly look forward to working with you and contributing towards making data accessible to all.

4 Likes

Hi @tsoru and mentors,

I’m really interested in this project on agentic question answering over DBpedia. I’ve worked on LLM-based NLP systems, including fine-tuning models and building AI-driven applications, and I’m currently exploring agentic frameworks for multi-step reasoning and tool usage.

I’m planning to draft a proposal focusing on designing a reliable agent pipeline that combines entity linking, SPARQL query generation, and validation loops for improved accuracy.

I also wanted to ask:

  • Are there any recommended baselines or prior implementations (beyond NSpM) that you would like contributors to build upon?
  • Would integrating a lightweight RAG-style component alongside SPARQL querying be a reasonable direction?

Looking forward to sharing a detailed proposal soon. Thanks!

Hi @SyedaAlizah,

Please check the DBpedia dataset and participants from the 2025 edition of the Text2SPARQL challenge.

Absolutely! RAG and GraphRAG are more than recommended to address this type of problem.

1 Like

Hi @tsoru and mentors,

I’m Piyush Gupta, a Computer Science undergraduate from India, and I’m very interested in this Agentic Question Answering over DBpedia project for GSoC 2026.

This idea strongly matches the kind of systems I’ve been exploring recently: LLM-based tooling, structured outputs, agent workflows, and reliable backend integrations.

Over the past few months, I’ve been contributing actively to Scala-based open source projects, especially in areas related to tool execution, structured output, tracing, validation, and system reliability:

  • llm4s — contributed to features such as structured output, tool execution retry / timeout handling, AgentContext, tracing integration, config parsing, and test coverage
  • Typelevel / Feral — contributed to typed serverless/event-model work and built a practical demo around typed event handling, decoding pipelines, and local verification
  • I’m especially interested in systems where LLMs must interact reliably with external tools / APIs, rather than acting as standalone black boxes

From reading the project description, what excites me most here is the challenge of building an agent that can select and compose the right tools over DBpedia — such as:

  • entity linking
  • ontology / schema lookup
  • SPARQL construction
  • query execution
  • answer validation / grounding

This feels like a very meaningful problem because it combines LLM reasoning with symbolic knowledge access, and also raises interesting questions around failure handling, intermediate verification, and answer reliability.

I’m currently going through the warm-up materials and would love to explore this project more seriously.

A couple of questions I had:

  1. Would you prefer contributors to treat this primarily as a Text2SPARQL / tool-orchestration problem, or more as a hybrid QA system that may combine symbolic querying with retrieval-based fallback?
  2. For evaluation, would you recommend starting directly with QALD / Text2SPARQL-style benchmarks, or first validating the system on a smaller set of manually curated DBpedia questions?
  3. Would a modular design (planner / linker / query-builder / validator) be preferred over a more end-to-end agent loop?

Looking forward to learning more and sharing progress soon.

Hi @tsoru and mentors and all the contributors,

I’m Siddharth, AIML undergrad from India working on LLM inference along with Agentic frameworks and RAG architectures.

I have gone through all the GSoC blogs from Aman’s 2018 raw template work through Mehrzad’s 2023 StarCoder fine-tuning, the four NSpM papers, and also checked out your team’s submission and the other participants at the Text2SPARQL 2025 challenge.

The evolution of this project is actually pretty clear to me.

Each year solved one layer of the problem, from manual templates all the way to fine-tuned code LLMs. The shift to agentic this year makes sense as the natural next step since the main bottleneck now isn’t translation quality but reliable tool composition and handling, especially entity linking, which has been the number 1 error source since Zheyuan’s 2020 work and still shows up in the Text2SPARQL results.

I’ve started working on the warm-up tasks and prototyping a basic pipeline locally.

I do have one question that I would appreciate if clarified.

Since entity linking errors dominate failures across every iteration of this project, would a hybrid approach make sense where the agent tries DBpedia Lookup first, and if confidence is low, falls back to vector similarity over entity embeddings as a GraphRAG-style retrieval? Curious if you’ve seen this work well in practice or if the Lookup service is reliable enough on its own for the QALD-style questions.

Hi @tsoru,

Thanks for the guidanc, I went through the warm-up materials, blogs, and the Text2SPARQL 2025 systems.

A couple of takeaways that are shaping my approach:

  • Studying Mehrzad’s 2023 work made it clear that entity linking must be a separate tool, the model consistently failed to map surface forms like ‘USA’ to dbr:United_States on its own.
  • The shift toward modular, tool-based agent architectures in recent systems makes a lot of sense for reliability and debugging.

I’m planning to reflect this in my proposal design.

One quick question:
For ontology/schema lookup, would you recommend querying the DBpedia SPARQL endpoint directly, or is there an existing indexed service similar to the entity linking setup?

Thanks again, I’ll share my proposal soon!

Hi,

I’ve started exploring the Neural SPARQL Machines (NSpM) repository to better understand the current pipeline for question answering over DBpedia.

As a first step, I submitted a small PR fixing incorrect command examples in the README (Interpreter Module section), which helped me get familiar with how the system works.

From my initial understanding, entity linking and query construction seem to be key challenges in building reliable QA systems over knowledge graphs. I’m currently working on a small prototype to explore an agent-based approach that combines tool usage (entity linking, SPARQL execution) with validation mechanisms.

Looking forward to learning more and contributing further!

Hi, I’m Rohan Cyriac Suraj, a 3rd-year Computer Science undergraduate from India, and I’m very interested in contributing to the Agentic Question Answering over DBpedia project for GSoC 2026.

Over the past few months, I’ve been working on a capstone project focused on knowledge graph construction and LLM-grounded reasoning, where I built an AI-powered research navigator that integrates structured graph data with generative models for more reliable academic exploration.

Although my current system is not agentic, it closely aligns with the foundational challenges of this project, especially entity linking, schema grounding, and reliable knowledge access, which I see as essential for building an effective agent over DBpedia.

What excites me most about this project is the shift from static pipelines like Neural SPARQL Machines to agentic systems that can iteratively plan, use tools, and refine their outputs. I’m particularly interested in multi-step reasoning over knowledge graphs, dynamic SPARQL construction with validation loops, handling ambiguity in entity and relation mapping, and grounding LLM outputs in structured data to reduce hallucination.

From the readings and prior GSoC work, this feels like a natural evolution from template-based and seq2seq approaches toward tool-augmented, self-correcting systems, which aligns strongly with my current work.

I had a few questions regarding direction and alignment. Would it be beneficial to extend my capstone into a DBpedia-backed agentic QA system by reusing my entity-linking and enrichment pipeline? In your experience, what are the biggest gaps in current approaches—reasoning and planning, tool reliability, or evaluation robustness? Would you prefer a modular architecture with components like planner, linker, query builder, and validator, or a more unified agent loop? Also, how important is incorporating fallback retrieval such as text or vector search alongside SPARQL querying for robustness?

I’m currently going through the warm-up materials and related work and look forward to refining my ideas further and sharing a detailed proposal soon.

Hi @tsoru, @smilingprogrammer

I’m Ayaan Ahmed Khan, a CS student at COMSATS University. I’m excited to share that I’ve just officially submitted my GSoC proposal for Agentic Question Answering over DBpedia using LangChain!

As I continue working through the warm-up tasks, I have two quick questions regarding the architecture:

  1. Do you prefer open-source models (Llama-3) to keep the pipeline completely open, or are commercial APIs (Gemini/OpenAI) acceptable for prototyping?
  2. Beyond entity linking and ontology lookup, are there other specific tools you highly recommend adding to the agent’s toolkit?

Thanks for your time, and I look forward to engaging with the community!

Best Regards, Ayaan Ahmed Khan

Hello @tsoru , hope you’re having a great day so far.

I am planning to apply for GSoC 2026 for the Neural SPARQL Machines project.

After going through the NSpM repository, past GSoC work, and recent discussions, it seems the current approaches largely rely on a single shot convention to get the answer to the questions.

Personally, I’ve been using local LLMs since when alibaba’s qwen came out. It could run on my laptop and I didn’t need to get a new laptop or computer, therefore I started experimenting a lot with local LLMs.

And from my experience, i found function gemma really really useful.

It is a tiny 270m parameters model that can call functions or APIs given a prompt on what to do. I have a phillips smart bulb that uses the pywizlight library to change the smart bulb’s color or brightness but I found it annoying because the model did not know what is the color code for red, #FF0000

That’s when the idea struck my mind, I do not have to pass everything an LLM and call it a smart system. I implemented a multi layer system that would first find the color code of the color and then I could prompt something like “police lights” and the light would flash red and blue like a police patrol car.

But I faced many problems there like the model slowly degrading after 5-6 prompts as it hit it’s context window limit. But it still worked as a great proof of concept, a better method to make an LLM call APIs without something like an MCP server (which also performs kinda bad with low end LLMs, which are needed if we need speed and efficient search results).

Therefore, I propose the idea of exploring a more agentic approach where query construction is broken into intermediate ste[s rather than generated in one go (one shot system)

I’ve built a mini prototype system where,

  1. The PLANNER decomposes the question into structures sub-tasks.
  2. DBpedia properties are dynamically discovered at runtime.
  3. A tiny LLM selects the most relevant property from candidates.
  4. Execution is performed iteratively, enabling simple multi-hop reasoning

The GOAL is to move from single shot query generatyion to step-wise, tool-grounded answering.

I had a couple of questions:

  1. From your experience, is improving planning/decomposition or improving tool reliability (for exmaple, property selection) the more impactful method?
  2. Would adding a verification/retry mechanism for failed steps align with the project goals? As a downside, this would increase query time.

Happy to share the prototype here
seems like i am not allowed to share links here

I achieved to make it answer questions like, “What is the capital city of the home country of Elon Musk” which gives very bad results when run against a single show machine.

But in my proposed architecture, the system first has to split into tasks, find out the first subejct, i.e, Elon Musk, then find his home country in the 2nd loop, and in the 3rd loop it finds the capital city.

This way, we can get better results than asking the entire question in a “single shot” way.
Excuse me for my bad English, thank you.