DBpedia has done a great job in collecting and organizing data. Data accessibility is the key to a better understanding of the world and a foundation of Data Literacy, a crucial competence for the current generation. In 2021, a GSoC project [1] created a Dialogflow-driven chatbot that enables users to work with the DBpedia knowledge graph. This chatbot provides easy-to-use access to a Question Answering engine that searches for facts that are answering the given natural-language question.
Despite using the preconfigured standard Question Answering (QA) pipeline, it is also possible to re-configure the question answering module of the chatbot. Hence, it is possible to select specific components reflecting a particular user’s interest regarding the search results and therefore optimize the results due to particular interest in the data. As we use the Qanary ecosystem as the reservoir for different QA components, there are many options to create QA pipelines. Currently, a preconfigured QA pipeline driven by QAnswer [2] and Qanary [3] is used to provide answers to typical questions.
However, an unsolved challenge of this chatbot is the need to know specific details about the available Question Answering components and their capabilities. Additionally, a user needs to know the possible misinterpretation of its questions. For example:
- The question „Who is the mayor of Springfield“ might be used to ask for the mayor of the “Springfield” from the famous American animated sitcom. However, if the system does not point to the intended imaginary city of Springfield, a user needs to provide more information or change the interpretation of the question or needs to change the used QA components while trying to influence the natural-language understanding process.
- The question „What is the capital of Mars?“ seems not to be answerable because of missing data in DBpedia. However, if „capital of“ and „Mars“ are recognized correctly, then a user does not need to search for possible other ways to ask for the sought information.
From these observations, we conclude the goals for our GSoC project:
- Provide a user with better access to internal information, s.t., the Question Answering process becomes a “glass box” instead of a “black box”. For this purpose, new dialog flows need to be created and additional visualizations or rich responses.
- As the overall goal is to improve the answer quality, additional Machine Learning components might be used to create recommendations for improved QA pipeline configurations (e.g., if a component has a low confidence score, then another one should be suggested).
- Validate the results by measuring the Question Answering quality. Run A/B tests with real users to understand how explainability influences users’ satisfaction.
The impact of this work would be:
- The integration of explainability/traceability of search results (“glass box” behavior to help users to understand the search behavior).
- The improvement of the search result quality by improving or developing from scratch one or more components in Qanary.
- The identification of typical misbehavior and the creation of requests for improvements of processing steps that might fail often.
- The study on the impact of the explainability feature on users’ satisfaction with the system.
- Stretch: Provide the ability for the user to interactively choose paths (debugging) when multiple intermediate results are produced by a given component during a step in the pipeline and record such interventions by the user as possible training data for a future machine-learning-based system.
Warm-up tasks:
- Implement a Google Dialogflow tutorial: Tutorials & samples | Dialogflow ES | Google Cloud
- Get familiar with the DBpedia chatbot (from GSoC 2021): Modular DBpedia Chatbot GSoC 2021 | DBpedia-GSoC-2021 and https://github.com/dbpedia/chatbot-ng
- Run simple SPARQL Queries on DBpedia to get familiar with the data and the technology (e.g., via Yasgui).
- Implement a simple Qanary component using Python or Java (see the guides at [4]).
- optional: Read the tutorial on implementing a trivial Qanary-driven question answering pipeline: GitHub: https://qanswer.github.io/QA-ESWC2021/slides.pdf. Reuse the already deployed Qanary test environment (Qanary pipeline and Qanary components) to create a question answering system capable of answering the question “What is the real name of Catwomen” and “What is the real name of Captain America”. Use the Qanary components DBpedia Spotlight and Query Builder for Real Names of Superheroes to configure your system without coding.
The project size can follow the medium-sized (~175 hours) and large (~350 hours) format. However, we prefer the large format as it provides more opportunities to increase the impact.
Mentors:
Remark: We are also happy to work together with the project executor in preparing a scientific publication on the project results.
Keywords: Question Answering, Natural-language understanding (NLP), Natural-language understanding (NLU), Recommendation, Machine Learning, Explainable AI, Knowledge Graphs, Linked Data, Semantic Web
References:
[1] GSoC 2021 project: Modular DBpedia Chatbot
[2] https://qanswer-frontend.univ-st-etienne.fr/