Understanding and Optimizing DBpedia Question Answering through Explanations - GSoC2022

anbo · February 10, 2022, 2:57am

DBpedia has done a great job in collecting and organizing data. Data accessibility is the key to a better understanding of the world and a foundation of Data Literacy, a crucial competence for the current generation. In 2021, a GSoC project [1] created a Dialogflow-driven chatbot that enables users to work with the DBpedia knowledge graph. This chatbot provides easy-to-use access to a Question Answering engine that searches for facts that are answering the given natural-language question.

Despite using the preconfigured standard Question Answering (QA) pipeline, it is also possible to re-configure the question answering module of the chatbot. Hence, it is possible to select specific components reflecting a particular user’s interest regarding the search results and therefore optimize the results due to particular interest in the data. As we use the Qanary ecosystem as the reservoir for different QA components, there are many options to create QA pipelines. Currently, a preconfigured QA pipeline driven by QAnswer [2] and Qanary [3] is used to provide answers to typical questions.

However, an unsolved challenge of this chatbot is the need to know specific details about the available Question Answering components and their capabilities. Additionally, a user needs to know the possible misinterpretation of its questions. For example:

The question „Who is the mayor of Springfield“ might be used to ask for the mayor of the “Springfield” from the famous American animated sitcom. However, if the system does not point to the intended imaginary city of Springfield, a user needs to provide more information or change the interpretation of the question or needs to change the used QA components while trying to influence the natural-language understanding process.
The question „What is the capital of Mars?“ seems not to be answerable because of missing data in DBpedia. However, if „capital of“ and „Mars“ are recognized correctly, then a user does not need to search for possible other ways to ask for the sought information.

From these observations, we conclude the goals for our GSoC project:

Provide a user with better access to internal information, s.t., the Question Answering process becomes a “glass box” instead of a “black box”. For this purpose, new dialog flows need to be created and additional visualizations or rich responses.
As the overall goal is to improve the answer quality, additional Machine Learning components might be used to create recommendations for improved QA pipeline configurations (e.g., if a component has a low confidence score, then another one should be suggested).
Validate the results by measuring the Question Answering quality. Run A/B tests with real users to understand how explainability influences users’ satisfaction.

The impact of this work would be:

The integration of explainability/traceability of search results (“glass box” behavior to help users to understand the search behavior).
The improvement of the search result quality by improving or developing from scratch one or more components in Qanary.
The identification of typical misbehavior and the creation of requests for improvements of processing steps that might fail often.
The study on the impact of the explainability feature on users’ satisfaction with the system.
Stretch: Provide the ability for the user to interactively choose paths (debugging) when multiple intermediate results are produced by a given component during a step in the pipeline and record such interventions by the user as possible training data for a future machine-learning-based system.

Warm-up tasks:

Implement a Google Dialogflow tutorial: Tutorials & samples | Dialogflow ES | Google Cloud
Get familiar with the DBpedia chatbot (from GSoC 2021): Modular DBpedia Chatbot GSoC 2021 | DBpedia-GSoC-2021 and https://github.com/dbpedia/chatbot-ng
Run simple SPARQL Queries on DBpedia to get familiar with the data and the technology (e.g., via Yasgui).
Implement a simple Qanary component using Python or Java (see the guides at [4]).
optional: Read the tutorial on implementing a trivial Qanary-driven question answering pipeline: GitHub: https://qanswer.github.io/QA-ESWC2021/slides.pdf. Reuse the already deployed Qanary test environment (Qanary pipeline and Qanary components) to create a question answering system capable of answering the question “What is the real name of Catwomen” and “What is the real name of Captain America”. Use the Qanary components DBpedia Spotlight and Query Builder for Real Names of Superheroes to configure your system without coding.

The project size can follow the medium-sized (~175 hours) and large (~350 hours) format. However, we prefer the large format as it provides more opportunities to increase the impact.

Mentors:

Remark: We are also happy to work together with the project executor in preparing a scientific publication on the project results.

Keywords: Question Answering, Natural-language understanding (NLP), Natural-language understanding (NLU), Recommendation, Machine Learning, Explainable AI, Knowledge Graphs, Linked Data, Semantic Web

References:

[1] GSoC 2021 project: Modular DBpedia Chatbot

[2] https://qanswer-frontend.univ-st-etienne.fr/

[3] https://github.com/WDAqua/Qanary

[4] https://github.com/WDAqua/Qanary/wiki

nakulraghav · February 12, 2022, 4:32pm

Hello everyone!

My name is Nakul Raghav and I am Electrical Engineering Student at Delhi Technological University, India. I wish to contribute to DBPedia as a GSoC 22’ student.

A brief background about me : I am a web developer with significant experience in Front End development using JavaScript and HTML. Apart from this, I take interest in community-oriented Chatbot and Software development and aspire to do the same here!

Attaching a link to my [personal website]Basic Banking System) and for chatbot link[https://github.com/NakulRaghav/finalbot]for the mentors’ reference.

I have been going through the past year GSoC projects and am particularly interested in the [DB Pedia Chatbot project]. If the project is open this year too, I would like to connect with the mentors to discuss a proposal for contributing to the same.

Looking forward to a great time here!

anbo · February 24, 2022, 1:22am

Hi Nakul,

thanks for reaching out to us and your interest into this GSoC project.

We will add you next week to an internal project’s Slack channel.

I am looking forward to discussing the project idea with you,
Andreas

varuniyer · March 10, 2022, 3:50am

Hi everyone! I’m Varun, a master’s student at Johns Hopkins University. I have prior experience in NLP research and my full CV can be viewed here.

I am specifically interested in this project because it could help improve the QA pipeline using recent advancements in neural NLP models. I would be interested in developing approaches to improve NLP components within the existing pipeline. To this end, what’s the best way I can get started on the application process to GSoC 22?

bhargav6031 · March 24, 2022, 5:51pm

@anbo @ramgathreya

Namaste,

I’m Bhargav, a second year undergraduate student pursuing my bachelor’s in Information and Computer Science. I would like to be part of this amazing journey with you guys.

I’m familiar with Tensorflow and Pytorch’s bare metal implementation of BERT model such as Roberta Base, and I’m also comfortable with Hugging face Pipelines.

I love to hear from you guys regarding the next steps.

My resume …

ad0lphus · April 4, 2022, 5:46pm

Hey Y’all,
Myself Prabith GS, CSE 2nd year undergrad. I am interested in working with DBpedia Chatbot for this year’s summer school.
As of now, I am familiar with Python, JAVA, GO, C, C++, and in web dev, I am currently working with HTML, CSS, javascript, typescript, reactJS, and angular. I have working experience in using Django as well.
I worked as a product engineer (developer) for Traboda Cyber labs (https://app.traboda.com/) which uses reactJS x typescript. I also have working experience in a student-run club named team bi0s (https://bi0s.in/) as a reverse engineer where I reverse engineer different software and find vulnerabilities.

resume : cv.prabith.gq/Prabith_Resume.pdf
portfolio: about.prabith.gq/

anbo · April 4, 2022, 10:57pm

Dear GSoC interested parties,

if you haven’t contacted us yet, please join the DBpedia Slack Channel: https://dbpedia.slack.com/archives/C0HN7KP9R

Create a new channel there and invite the four people mentioned above.

We will look at your questions about the project and the application process individually for each of you.

Looking forward to discussing with you,
Andreas et al.

anbo · April 4, 2022, 10:58pm

Please recognize the message: Understanding and Optimizing DBpedia Question Answering through Explanations - GSoC2022 - #7 by anbo

sachinyadavg629 · January 16, 2024, 10:14am

It represents a valuable initiative in the stages of new product development. This project involves leveraging DBpedia for question answering and optimizing the process through explanatory mechanisms. In the early stages, there is likely a focus on requirements analysis and laying the foundation for the project. As development progresses, integrating and fine-tuning DBpedia for effective question answering becomes a pivotal phase. Iterative improvements and debugging would follow, ensuring the product aligns with user expectations. The project underscores the significance of thorough testing and continuous refinement, essential components in the dynamic stages of new product development. Overall, this initiative contributes to advancing knowledge extraction and question answering capabilities within the DBpedia ecosystem.