🤖 A Neural QA Model for DBpedia: Compositionality - GSoC2020

Introduction

Neural SPARQL Machine is a project that deals with building an end-to-end system to answer questions posed by user not versed with writing SPARQL queries.

Currently DBpedia hosts billions of such data points and corresponding relations in the RDF format. Accessing such data is difficult for a lay user, who does not know how to write a SPARQL query. This proposal tries to built upon a System: ( ​​https://github.com/AKSW/NSpM/tree/master ​) — which tries to make this humongous linked data available to a larger user base in their natural languages(now restricted to English) by improving, adding and amending upon the existing codebase.

Source Code and Documentation

The latest code-base is available here: GitHub - dbpedia/neural-qa: đź“š A Neural QA Model for DBpedia using Neural SPARQL Machines.

Blogs

To better understand the project please look into the following links:

  1. [GSoC 2018] Aman’s Blog: https://amanmehta-maniac.github.io/
  2. [GSoC 2019] Anand’s Blog: A Neural QA Model for DBpedia | Making data accessible to everyone

Reading Material:

  1. {SPARQL} as a Foreign Language: [1708.07624] SPARQL as a Foreign Language
  2. Neural Machine Translation for Query Construction and Composition: [1806.10478] Neural Machine Translation for Query Construction and Composition
  3. Introduction to Neural Network based Approaches for
    Question Answering over Knowledge Graphs: https://arxiv.org/pdf/1907.09361.pdf

Warm up tasks:

  1. Read through the blogs and the reading list to get a good understanding of the code. This will allow you to get a good idea about the project.
  2. Run the pipelines in the gsoc/anand folder of the repository mentioned above. For a certain ontology.

Ideas

Now that you have a good understanding of the current state of the project, we suggest you to build proposals pondering on some of the following points, feel free to bring your own solutions to tackle the problems that the project faces.

  1. Structure of the questions.
    Basic Graph Pattern (BGP)
    1. subordinate clauses or genitive (which / that / of / ’s)
    2. con-/disjunctions (and / or / as well as)
    3. modifiers (which + mod / what + mod / demonyms)
    4. comparative (more than / -er than)
    5. superlative (most … / -est)
    6. numeric / quantitative (how many / long / tall)
  2. Tackling out of vocabulary words
  3. Using word embedding
  4. Integrating fast-text
  5. Updating to code-base to python3

Feel free to contact us for more information. We eagerly look forward to working with you and contributing towards making data accessible to all.

4 Likes

Hi! I’d be interested in working on this during GSoC. Can we submit more than one proposal to the same organization, by any chance?

sure :slight_smile:

1 Like

Right, thanks! For this project, should I contact the mentor and start working on a proposal now?

Hi @diogenesis,

Before starting up with the proposal, I would suggest you to read the papers and complete the warm up tasks. Doing so will help you in writing a good proposal. Feel free to ask questions here.

Hi. I am interested in this project, but I have some issues running it because of the version of TensorFlow. I see that this project is developed with python 2, tensorflow1.12, but my PC is under system windows and the most recent version for python27 windows is tensorflow1.10. Do I need a Linux environment? Or do you have any suggestion?

It’s no longer a problem, I’ve created a VM Linux and it works well :wink:

Sounds Good, keep us posted as you progress.

Hello, I’m wondering if you have gone any deeper for this project after the last year of GSOC as mentioned in Anand’s blog?

Hi @baizydl,

We did have multiple discussions after the GSoC period ended, some of the discussed points have been added to the ideas section in the topic description mentioned above.

Thanks, @panchbhai1969. I will try to work on my first draft of proposal. By the way, do we need to merge any Pull Request to be a candidate?

Sure, do share the draft proposal with us (Recommended platform: Google Docs, share with us privately).

As far as pull requests and merges are concerned, its not compulsory. But we do encourage you to interact with the code and create pull request for small issues, if you come across any.

Hi, is this project and another one named “Multilingual Neural QA” same ?

Hi @nikhit ,

If you are referring to this: DBpedia Neural Multilingual QA - GSoC2020.

On first glance they may seem similar but if you take a closer look, you will find that this project (briefly) focuses on the aspect of NSpM that deals with handling a wide range of compositional question currently limited to English Language (complex questions)(hint: Check out the Basic Graph Patterns and other ideas in the topic description above).

Whereas the project you are referring to focuses on the multilingual aspect of Neural QA. Thus, extending the NSpM framework to couple with the multilingualism challenge as stated in the corresponding page. You may find more information about DBNQA here: https://github.com/AKSW/DBNQA.

Hi, I submitted my proposal and an email. Have you read them?

Hi Jason,

Indeed, I have gone through your proposal. Please provide us comment access, so that I can answer the questions you have asked in the doc files as comments.

Ok, i changed settings.

1 Like

Hello, My name is Mahesh Kulkarni.Currently I am in my final year of B.Tech degree from Vishwakarma Institute Of Technology, Pune , India. I have some prior experience with NLP , Deep Learning. I am finding interest in this project. I want to contribute to it. I have gone through warm up tasks.Any further helpful instructions so that I will get more clarifications about the project?
Thank you

Hi Mahesh,

Sounds, great! The description above contains all the information necessary to help you to get started. Draft a proposal with your ideas pertaining to this projects and share with us.

Thanks for quick reply, in your blog
Future aspects of this project:
Working on variable awareness :
can you elaborate this so i can get more idea?
also adding some SPARQL learning resources will be helpful for me.