DBpedia Live Neural Question Answering Chatbot - GSoC2021

emarx · February 16, 2021, 1:57pm

Description
In this topic, the student will implement and deploy a live chatbot version of the DBpedia Neural Question Answering dataset [1].

[1] https://github.com/AKSW/DBNQA

Goal
Create a life DBpedia Neural Chatbot based on DBNQA and NSpM.

Impact
(1) Facilitate access to DBpedia content;
(2) Enable community evaluation and feedback of DBpedia NSpM models.

Warm-up tasks
(1) Fork the NsPM project (https://github.com/AKSW/NSPM );
(2) Train the Monument 300 and Monument 600 datasets https://github.com/AKSW/NSpM/tree/master/data;
(4) Fork and train the model using a subset (30 first lines) of DBNQA dataset https://github.com/AKSW/dbnqa;
(5) Instantiate the NSpM Telegram Chatbot: https://github.com/AKSW/NSpM/wiki/NSpM-Telegram-Bot

Reference Material

“An example of an excellent proposal that was accepted a few years ago. Please mind that the number of total project hours have changed from 300 to 175.” Tommaso Soru
Google Dialog Flow
Because, why not? Tutorial: https://cloud.google.com/dialogflow/es/docs/tutorials?hl=en
The DBpediaChatBot (http://chat.dbpedia.org/)
It will be interesting to re-use their interface

Mentors
Edgard Marx
Lahiru Hinguruduwa
Thiago Castro Ferreira

Keywords
#NSpM #DBpedia #Chatbot #AI #ConversationalAI

tathagata-raha · March 9, 2021, 10:05pm

Hi, I was interested in this project and would like to know more details. Should I go ahead and start with the warm up tasks?

ashutosh16399 · March 10, 2021, 3:30pm

Hii, @emarx I’m an undergrad student from India and been working on NLP and Knowledge graphs in my recent project related to news co-relation. I’m highly interested in this project, starting with the warm-up task. Looking forward to a great learning experience with you.

emarx · March 10, 2021, 7:10pm

Sure, go ahead.
I will probably add a successful proposal here as well.

emarx · March 10, 2021, 7:10pm

thanks for your interest,

Let me know if you have any question.

looking forward…

ashutosh16399 · March 12, 2021, 6:00am

Hii, @emarx in NSpM I generated the dataset using the monument_template. I got the flow of how the template.csv is broken down in annotations and then annotations are passed to generate the query and that query is then used to fetch the result and from those bindings are used to generate a pair of en_question and SPARQL, but I don’t get the complete hang of
“prepare_generator_query(template, add_type_requirements=True, do_special_class_replacement=True)”
like how the complete working query is created and what type of queries are there, so if there’s any documentation or examples for such queries it will be helpful.
correct me if i’m wrong somewhere with the flow.
Thanks.

emarx · March 13, 2021, 3:24pm

Have a look at the DBNQA repository.
I recommend you to have a look at the research papers referenced there.

as well as in the NSpM papers, so you know how the instantiation works:

ashutosh16399 · March 13, 2021, 4:52pm

Sure, thank you.

ashutosh16399 · March 18, 2021, 12:55pm

hi @emarx I’m almost done with task1… trying to generate the query, I understood the flow of training the translation model from the paper and also the architecture of the attention model used, but while I’m generating the SPARQL using English through model it gives tokens out of vocab i tried using train data as well the result is same. what’s the problem here?

Also for warmup task 2, there are different datasets in DBNQA like (art, sport,etc…) so for training shall I use the first 30 lines from all the files or from any specific file like the last one was for monuments?

Thank you.

ashutosh16399 · March 19, 2021, 12:37pm

update on this one, I wasn’t able to get the output from the model I trained from NSpM “https://github.com/AKSW/NSPM” repository so instead, I cloned the
" https://github.com/AKSW/NSpM/wiki/NSpM-Telegram-Bot " repo which had pre-trained models used this one to generate the query but the code was in python 2 so converted it to python3 and now I’m able to get the right query.

here is the updated code repo ‘https://github.com/ashutosh16399/NSpM-telegram-bot’

I don’t know if the problem is with my trained model or the output predictor file. though the model had a bleu score of 94 on avg on test data.

found the same issue tagged as bug on the repo.

emarx · March 25, 2021, 5:02pm

Sorry for that, it is visible a NSpM parser problem.
You can try to use old NSpM versions too.

emarx · March 25, 2021, 5:21pm

I will be hosting a GSOC session next Friday, write down your questions and join me if you can.

Topic: Edgard Marx’s Zoom Meeting
Time: Apr 2, 2021 06:00 PM Amsterdam, Berlin, Rome, Stockholm, Vienna

Join Zoom Meeting

Meeting ID: 971 7701 8759
Passcode: dbpedia
One tap mobile
+16468769923,97177018759# US (New York)
+16699006833,97177018759# US (San Jose)

Dial by your location
+1 646 876 9923 US (New York)
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 346 248 7799 US (Houston)
+1 408 638 0968 US (San Jose)
Meeting ID: 971 7701 8759
Find your local number: https://eccenca.zoom.us/u/adeQxEg4QS

ashutosh16399 · March 25, 2021, 7:30pm

hi @emarx yeah actually the problem was with nmt submodule due to tf dependencies so i updated it with https://www.tensorflow.org/tutorials/text/nmt_with_attention as mentioned

and it’s working fine with TensorFlow 2.1.0 I have changed few files and removed some but the end result is same.it can be improved further and pipeline can also be made better but currently this works fine though training time is large.

Link to repo - https://github.com/ashutosh16399/NSpM/tree/en_NspM please review.
i have raised a PR for this as well. Also warm up task 1 and 2 completed.

Thanks.

ashutosh16399 · March 25, 2021, 7:31pm

Sure, Thank you.

shobhitsinha13 · March 25, 2021, 7:33pm

Hii, @emarx I’m a junior undergrad student from India and been working on NLP. I’m highly interested in the project, starting with the warm-up task. Looking forward to a great learning experience with the team.

emarx · March 29, 2021, 10:07am

Welcome

priyamkakati46 · March 30, 2021, 9:04am

Hi @emarx I’m currently a final year undergrad from india.I’ve been working on few ML problems till now and have two internship experience.And for GSoC I’m interested to work in this project.I’m starting with the warmup task.Hoping for a great learning experience with you.

emarx · March 30, 2021, 10:29am

Looks good

emarx · March 30, 2021, 10:30am

ofc, looking forward