Training a Model for Neural Question Answering over DBpedia — GSoC 2024

This project started in 2018 as ‘A Neural QA Model for DBpedia’ and is now looking to its 7th consecutive year at Google Summer of Code.

Introduction

Neural SPARQL Machines (NSpM) aim at building an end-to-end system to answer questions posed by user not versed with writing SPARQL queries.

Currently, billions of relationships on the Web are expressed in the RDF format. Accessing such data is difficult for a lay user, who does not know how to write a SPARQL query. This GSoC project consists of building upon the NSpM question answering system, which tries to make this humongous linked data accessible to a larger user base in their natural language (as of now restricted to English) by improving, adding and amending upon the existing codebase, which resides at the link below.

Documentation

Related work

The first 3 papers introduce and elaborate on Neural SPARQL Machines. Work number 3 was carried out by our GSoC 2019 student and published at KGSWC 2020. The 4th paper is an almost-complete survey of related approaches.

  1. SPARQL as a Foreign Language
  2. Neural Machine Translation for Query Construction and Composition
  3. Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
  4. Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs

GSoC Blogs

You may also check which problems past GSoC contributors worked on:

  1. [GSoC 2018] Aman’s Blog — building raw templates
  2. [GSoC 2019] Anand’s Blog — automating template creation
  3. [GSoC 2020] Zheyuan’s Blog — paraphrasing questions
  4. [GSoC 2021] Siddhant’s Blog — data augmentation
  5. [GSoC 2022] Saurav’s Blog — refining template discovery
  6. [GSoC 2023] Mehrzad’s Blog

Warm-up tasks

  1. Read the Medium post What is a Neural SPARQL Machine? to get a general idea about NSpM.
  2. Read through the most recent blogs and the reading list to get a good understanding of the code. This will allow you to get a good idea about the project.
  3. Run the pipeline in the ./gsoc/mehrzad folder of the base repository using examples of your choice.

Your proposal

Now that you have a good understanding of the current state of the project, we ask you to write your own proposal. Feel free to bring your own solutions to tackle the problem that the project currently faces, i.e. training a question-answering model using the dataset we have built over the years.

Although the original paper mentions a seq2seq model, the NSpM paradigm allow us to choose any model as our Learner to translate natural-language questions into SPARQL. You may even propose your own model or one from any other community (e.g., HuggingFace).

Project size

The size of this project can be either medium or large. Please state in your proposal the number of total project hours you intend to dedicate to it (175 or 300).

Mentors

@panchbhai1969 , @mehrzadshm , TBD

Feel free to contact us for more information. We eagerly look forward to working with you and contributing towards making data accessible to all.

1 Like

Hi @panchbhai1969, @mehrzadshm

I am Vedant Udan a bachelor’s student at IIT Bhilai. During my bachelor’s, I have worked lots on LLMs and NLP related task, like finetuning the models , prompt engineering and many other NLP task.
Due to my previous experience, i find this project particularly interesting and want to try out how LLMs along with the awesome NLP tools will helps to solve this problem.

Looking forward to the amazing opportunity.

Hi @panchbhai1969 @mehrzadshm I came across this project and its seems quite Interesting, I have a solid foundation in ML and I am also a Microsoft Certified Solutions Developer for Natural Langauge Processing. I would love to get further details and discuss my proposal for the same. Looking Forward to working with you. Also please provide some contact details.

Thanks and regards
Chirag Tyagi
tyagichirag06@gmail.com

1 Like

Hey @VedantUdan , go through the blogs, code and complete the warm up tasks.

Get a feel of the project. Then suggest an area you want to explore here in a proposal, Google doc link.

The faster you share your draft proposal the better we will be able to improve it before the final submission.

1 Like

Hi @chiragtyagi2003 , you can join our slack channel to communicate with the mentors. It is awesome to see you here…

Pro tip

  1. Checkout the blogs and code.
  2. Complete warm up task
  3. Share a draft proposal of the idea you want to work with us this summer.

Slack link: dbpedia.slack.com

Here are some sample proposals for you:

Hey @panchbhai1969
This is Soham, currently pursuing my Masters in Artificial Intelligence from Univeristy of Amsterdam. I would love to work in this research project. I have professional experience as an Applied Scientist at Amazon and Fraud Analyst at OneCard and a Software developer at Oracle.
Prior to this I also did a Masters from Indian Statistical Institute with specialization in Data Science.
LinkedIn: Soham Chatterjee - Amsterdam, North Holland, Netherlands | Professional Profile | LinkedIn

How can I join the slack channel ? It says : “It looks like there isn’t an account on DBpedia tied to this email address.”

I would go over the project in more detail. Lets connect on SLack and discuss more in detail.

1 Like

Hi @panchbhai1969, @mehrzadshm, @tsoru, @sauravjoshi23, @sanjudbpedia

Could you please provide me with your email address or suggest another convenient medium where I can share my draft proposal with you for review?

Your insights and suggestions would be invaluable in refining my proposal and increasing its chances of success in the selection process.

1 Like

Please create a Google doc, so we can leave comments, and share it with my account (mommi84 at gmail dot com).

1 Like

Please try again at this link:
Slack

Hi @panchbhai1969 , @mehrzadshm
I am Alexander Osadolor, I just concluded a Data Science bootcamp at HyperionDev, and I am currently pursuing a master’s degree at Teesside University. I find this topic intriguing, and would gladly love to be a part of it.
Regards

Hi @alexosayi , you can join our slack channel to communicate with the mentors. It is awesome to see you here…

Pro tip

Checkout the blogs and code.
Complete warm up task
Share a draft proposal of the idea you want to work with us this summer

The submission date is approaching fast

Slack link: Slack