DBpedia Live Neural Question Answering Chatbot - GSoC2021

Description
In this topic, the student will implement and deploy a live chatbot version of the DBpedia Neural Question Answering dataset [1].

[1] https://github.com/AKSW/DBNQA

Goal
Create a life DBpedia Neural Chatbot based on DBNQA and NSpM.

Impact
(1) Facilitate access to DBpedia content;
(2) Enable community evaluation and feedback of DBpedia NSpM models.

Warm-up tasks
(1) Fork the NsPM project (https://github.com/AKSW/NSPM );
(2) Train the Monument 300 and Monument 600 datasets https://github.com/AKSW/NSpM/tree/master/data;
(4) Fork and train the model using a subset (30 first lines) of DBNQA dataset https://github.com/AKSW/dbnqa;
(5) Instantiate the NSpM Telegram Chatbot: https://github.com/AKSW/NSpM/wiki/NSpM-Telegram-Bot

Reference Material

image

Mentors
Edgard Marx
Lahiru Hinguruduwa
Thiago Castro Ferreira

Keywords
#NSpM #DBpedia #Chatbot #AI #ConversationalAI

1 Like

Hi, I was interested in this project and would like to know more details. Should I go ahead and start with the warm up tasks?

Hii, @emarx I’m an undergrad student from India and been working on NLP and Knowledge graphs in my recent project related to news co-relation. I’m highly interested in this project, starting with the warm-up task. Looking forward to a great learning experience with you.

Sure, go ahead.
I will probably add a successful proposal here as well.

thanks for your interest,

Let me know if you have any question.

looking forward…

Hii, @emarx in NSpM I generated the dataset using the monument_template. I got the flow of how the template.csv is broken down in annotations and then annotations are passed to generate the query and that query is then used to fetch the result and from those bindings are used to generate a pair of en_question and SPARQL, but I don’t get the complete hang of
prepare_generator_query(template, add_type_requirements=True, do_special_class_replacement=True)”
like how the complete working query is created and what type of queries are there, so if there’s any documentation or examples for such queries it will be helpful.
correct me if i’m wrong somewhere with the flow.
Thanks.

Have a look at the DBNQA repository.
I recommend you to have a look at the research papers referenced there.

as well as in the NSpM papers, so you know how the instantiation works:

Sure, thank you.

hi @emarx I’m almost done with task1… trying to generate the query, I understood the flow of training the translation model from the paper and also the architecture of the attention model used, but while I’m generating the SPARQL using English through model it gives tokens out of vocab i tried using train data as well the result is same. what’s the problem here?

Also for warmup task 2, there are different datasets in DBNQA like (art, sport,etc…) so for training shall I use the first 30 lines from all the files or from any specific file like the last one was for monuments?

Thank you.

update on this one, I wasn’t able to get the output from the model I trained from NSpM “https://github.com/AKSW/NSPM” repository so instead, I cloned the
" https://github.com/AKSW/NSpM/wiki/NSpM-Telegram-Bot " repo which had pre-trained models used this one to generate the query but the code was in python 2 so converted it to python3 and now I’m able to get the right query.

here is the updated code repo ‘https://github.com/ashutosh16399/NSpM-telegram-bot

I don’t know if the problem is with my trained model or the output predictor file. though the model had a bleu score of 94 on avg on test data.

found the same issue tagged as bug on the repo.

Sorry for that, it is visible a NSpM parser problem.
You can try to use old NSpM versions too.

I will be hosting a GSOC session next Friday, write down your questions and join me if you can.

Topic: Edgard Marx’s Zoom Meeting
Time: Apr 2, 2021 06:00 PM Amsterdam, Berlin, Rome, Stockholm, Vienna

Join Zoom Meeting

Meeting ID: 971 7701 8759
Passcode: dbpedia
One tap mobile
+16468769923,97177018759# US (New York)
+16699006833,97177018759# US (San Jose)

Dial by your location
+1 646 876 9923 US (New York)
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 346 248 7799 US (Houston)
+1 408 638 0968 US (San Jose)
Meeting ID: 971 7701 8759
Find your local number: https://eccenca.zoom.us/u/adeQxEg4QS

hi @emarx yeah actually the problem was with nmt submodule due to tf dependencies so i updated it with https://www.tensorflow.org/tutorials/text/nmt_with_attention as mentioned

and it’s working fine with TensorFlow 2.1.0 I have changed few files and removed some but the end result is same.it can be improved further and pipeline can also be made better but currently this works fine though training time is large.


Link to repo - https://github.com/ashutosh16399/NSpM/tree/en_NspM please review.
i have raised a PR for this as well. Also warm up task 1 and 2 completed.

Thanks.

Sure, Thank you.

Hii, @emarx I’m a junior undergrad student from India and been working on NLP. I’m highly interested in the project, starting with the warm-up task. Looking forward to a great learning experience with the team.

Welcome :slight_smile:

Hi @emarx I’m currently a final year undergrad from india.I’ve been working on few ML problems till now and have two internship experience.And for GSoC I’m interested to work in this project.I’m starting with the warmup task.Hoping for a great learning experience with you.

Looks good :slight_smile:

1 Like

ofc, looking forward