A Neural QA Model for DBpedia - GSoC2021

tsoru · February 3, 2021, 1:38pm

This project started in 2018 and is now at its 4th consecutive year at DBpedia’s Google Summer of Code.

Introduction

Neural SPARQL Machines aim at building an end-to-end system to answer questions posed by user not versed with writing SPARQL queries.

Currently, DBpedia hosts billions of such data points and corresponding relations in the RDF format. Accessing such data is difficult for a lay user, who does not know how to write a SPARQL query. This GSoC project consists of building upon the NSpM question answering system, which tries to make this humongous linked data accessible to a larger user base in their natural language (as of now restricted to English) by improving, adding and amending upon the existing codebase.

Documentation

Source Code

The latest codebase is available at this forked repo: https://github.com/dbpedia/neural-qa

Blogs

To better understand the project please look into the following links:

[GSoC 2018] Aman’s Blog: https://amanmehta-maniac.github.io/
[GSoC 2019] Anand’s Blog: https://anandpanchbhai.com/A-Neural-QA-Model-for-DBpedia/
[GSoC 2020] Zheyuan’s Blog: https://baiblanc.github.io/

Reading Material

The first 3 papers introduce and elaborate on Neural SPARQL Machines. Work number 3 was carried out by our GSoC 2019 student and published at KGSWC 2020. The 4th paper is an almost-complete survey of related approaches.

SPARQL as a Foreign Language: https://arxiv.org/abs/1708.07624
Neural Machine Translation for Query Construction and Composition: https://arxiv.org/abs/1806.10478
Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition: https://arxiv.org/abs/2010.10900
Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs: https://arxiv.org/pdf/1907.09361.pdf

Warm-up tasks

Read through the blogs and the reading list to get a good understanding of the code. This will allow you to get a good idea about the project.
Run the pipelines in the ./gsoc/anand and ./gsoc/zheyuan folders of the base repository using examples of your choice.

Your proposal

Now that you have a good understanding of the current state of the project, we suggest you to build proposals pondering on some of the following points, feel free to bring your own solutions to tackle the problems that the project faces.

How can we automatically build the right question from the property label only?
- example a) from <s> dbo:birthPlace <o> infer where was <s> born?
- example b) from <s> dbo:timeZone <o> infer what time zone is <s> in?
How can we automatically build question-query templates that feature one or more of the following?
- subordinate clauses or genitive: which / that / of / ’s
- con-/disjunctions: and / or / as well as
- modifiers: which + mod / what + mod / demonyms
- comparative: more than / -er than
- superlative: most … / -est
- numeric / quantitative: how many / long / tall

Consider experimenting with advanced approaches such as GPT-2 or BERT.

Mentors

@panchbhai1969, @tsoru, TBD

Feel free to contact us for more information. We eagerly look forward to working with you and contributing towards making data accessible to all.

riyabelle25 · February 3, 2021, 7:16pm

Hey there! I’m Riya Elizabeth John, a sophomore from IIT Roorkee, India.
I have a strong interest in this domain and am part of the Vision and Language Group(DL research group) of my institute.
Thrilled to see the ideas released by DBpedia this year, I’ll get started on the reading material ASAP. Really excited to work with you guys!

tsoru · February 5, 2021, 5:03pm

Hi @riyabelle25 and welcome!

Glad to see you interested in this project.

siddhantjain07 · February 16, 2021, 4:43am

Hello, @tsoru @panchbhai1969.
I am Siddhant Jain, a pre-final year student from Pune. The overall concept of the project and looking at previous blogs it seems really intuitive and interesting.
Have been going through the references, feels positive
Hoping to have a steep learning curve this summer.

tsoru · February 19, 2021, 10:53am

Welcome, @siddhantjain07, and thank you for your interest in the project.

saarahasad · March 10, 2021, 12:07am

@tsoru @panchbhai1969
Hello everyone! I’m Saarah (website) from India currently doing a Masters of Technology in Computer Science at MSRIT, India. I’m excited and look forward to contributing and being valuable to DBpedia.

I have looked through the details and references of the project mentioned here. I’ve been working on the warm up tasks and was able to run some examples as well.

A few follow up questions I had to confirm my understanding:

The end goal of working on problem 1 is to refine the training/test data?
Problem 2 is focused on futhering the work done by @panchbhai1969 here https://anandpanchbhai.com/A-Neural-QA-Model-for-DBpedia/WeekSix to generate NL query and SPARQL query templates with new techniques?

Have a great day!

tsoru · March 11, 2021, 2:38pm

Hi @saarahasad, welcome to the Forum!

Yes, to refine it, augment it, and expand its question coverage.

Anand created templates using a rule-based algorithm, whereas Zheyuan exploited transformers to make the questions sound more natural. Based on this, we want the candidates to explain how they would solve the problem building on work that has been already done.

saarahasad · March 11, 2021, 4:40pm

Alright, will work towards that. Thank you!

tsoru · March 27, 2021, 9:58pm

To @riyabelle25 @siddhantjain07 @saarahasad and everyone else interested in the project.

On the 29th of March, the Google Summer of Code website will open the submission window for your proposals, which will remain open until the 13th of April.

Please draft a Google Doc on the lines of this example of an excellent proposal and share it in editor mode with me (mommi84 at gmail dot com). The other co-mentors and I will try and help you prepare your project proposal.

Important: Please mind that the number of total project hours have changed from 300 to 175. We do not expect your project to be as extensive as in the previous editions.

tsoru · April 5, 2021, 2:09pm

An update for @riyabelle25 @siddhantjain07 @saarahasad and everyone else interested in the project.

As of today, we have received ZERO submissions for this project, therefore there is still plenty of chances for you to get accepted in this year’s Summer of Code.

You have a little more than a week to submit your proposals. Please send your draft to me as soon as possible, so that we can give you my feedback before the deadline.

rishavmishrarm · April 5, 2021, 3:22pm

Hello, @tsoru @panchbhai1969.
I am Rishav Mishra, a pre-final year student from SRM University. The overall concept of the project is pretty interesting and I have worked on some projects.
I have gone through some of its techniques and found them very much Interesting
Hoping to get lucky to work on this project.

tsoru · April 6, 2021, 10:26am

Hi @rishavmishrarm and thanks for your interest in the project.

saarahasad · April 8, 2021, 4:17am

@tsoru Hi! I’ve been working on this for a while. Couldn’t send in my proposal draft because I fell extremely ill and just recovered. Will send in my draft by today. Thank you

tsoru · April 8, 2021, 6:34pm

No worries, @saarahasad! I hope you will be able to share your draft on time. Take care!