A Multilingual Neural RDF Verbalizer - GSoC2021

diegomoussallem · February 19, 2021, 4:40pm

Description:

Natural Language Generation (NLG) is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001) and Semantic Web (SW) data (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014). Presently, the generation of natural language from SW, more precisely from RDF data, has gained substantial attention (Bouayad-Agha et al., 2014; Staykova, 2014). Some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). However, English is the only language that has been widely targeted. Even though there are studies that explore the generation of content in languages other than English, to the best of our knowledge, only our previous GSoC project, NABU, has been proposed to train a multilingual neural model for generating texts in different languages from RDF data.

Previous GSoC 2020

We published NABU at ISWC

Goals:

In this GSoC Project, the candidate is entitled to train and extend our multilingual neural model that is capable of generating natural language sentences from DBpedia RDF triples in more than one language. The idea is to increment our last GSoC project by investigating other NN architectures.

Impact:

The project may allow users to generate automatically short summaries about entities that do not have a human abstract using triples.

Warm-up tasks:

Read the papers:
NABU
A Holistic Natural Language Generation Framework for the Semantic Web
Neural End-to-End vs Pipeline
NeuralREG: An end-to-end approach to referring expression generation
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Attention is all you need
Download and get familiar with the code of papers above.
https://github.com/dice-group/NABU
https://github.com/msobrevillac/Multilingual-RDF-Verbalizer
https://github.com/DiegoMoussallem/RDF2NL.
https://github.com/dice-group/RDF2PT
https://github.com/ThiagoCF05/NeuralREG
https://github.com/ThiagoCF05/DeepNLG/
Get familiar with our last GSoC project - https://github.com/dbpedia/neural-rdf-verbalizer

Mentors

Diego Moussallem and Thiago Castro Ferreira

Keywords

NLG, Semantic Web, NLP

piyumalanthony · February 22, 2021, 8:58am

I’m looking forward for contributing this project. Thanks in advance.

kolk · March 9, 2021, 10:12am

Hello, I’m a PhD student at University of Amsterdam working on Complex QA and language generation for QA. This project looks really interesting and I would really like to contribute to it. Hope to work with you soon

chaudhary1337 · March 28, 2021, 3:42am

Hey @diegomoussallem!
I am Tanishq Chaudhary, studying Computer Science Engineering and Computational Linguistics in International Institute of Information Technology, Hyderabad, India.

I am really interested in NLG, and I am done with the warm-up tasks. I have some ideas in mind going forward. Where can I contact you?

thiagocf05 · April 9, 2021, 12:20pm

Thanks piyumalanthony. Let me know if you need something

thiagocf05 · April 9, 2021, 12:20pm

Great kolk. Let me know whether you need something. Best wishes,

thiagocf05 · April 9, 2021, 12:22pm

Dear Tanishq, you can reach me on slack (Thiago Castro Ferreira) or send me a DM on twitter (@thiagocasfer). Best wishes.

swagatsbhuyan · July 17, 2021, 2:02pm

Hey @diegomoussallem @thiagocf05
I am Swagat Shubham Bhuyan, a Computer Science student pursuing my B. Tech in the National Institute of Technology, Silchar.
I am currently going through the warm up tasks and would definitely like to contribute to this NLG project. Where can I find the source codes hosted? And where can I contact the team?