A Multilingual Neural RDF Verbalizer - GSoC2020

diegomoussallem · February 11, 2020, 9:00pm

Description:

Natural Language Generation (NLG) is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001) and Semantic Web (SW) data (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014). Presently, the generation of natural language from SW, more precisely from RDF data, has gained substantial attention (Bouayad-Agha et al., 2014; Staykova, 2014). Some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). However, English is the only language that has been widely targeted. Even though there are studies that explore the generation of content in languages other than English, to the best of our knowledge, no work has been proposed to train a multilingual neural model for generating texts in different languages from RDF data.

Goals:

In this GSoC Project, the candidate is entitled to train a multilingual neural model that is capable of generating natural language sentences from DBpedia RDF triples in more than language. The idea is to increment our last GSoC project by investigating other NN architectures.

Impact:

The project may allow users to generate automatically short summaries about entities that do not have a human abstract using triples.

Warm-up tasks:

Read the papers:
A Holistic Natural Language Generation Framework for the Semantic Web
Neural End-to-End vs Pipeline
NeuralREG: An end-to-end approach to referring expression generation
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Attention is all you need
Download and get familiar with the code of papers above.
https://github.com/DiegoMoussallem/RDF2NL.
https://github.com/dice-group/RDF2PT
https://github.com/ThiagoCF05/NeuralREG
https://github.com/ThiagoCF05/DeepNLG/
Get familiar with our last GSoC project - https://github.com/dbpedia/neural-rdf-verbalizer

Mentors

Diego Moussallem and Thiago Castro Ferreira

Keywords

NLG, Semantic Web, NLP

ayushjain9501 · February 24, 2020, 6:24am

Hi,
I am Ayush Jain, an undergraduate at Indian Institute of Technology, Kanpur. I am very interested in the field of Natural Language Generation and Machine Translation.
I am interested to work on this project.
I have gone through the the papers mentioned as warm-up tasks and currently going through the code base of last year’s project.

As mentioned in the description the project aims to investigate other NN architectures.
In the last year’s project, Graph Attention encoder based architecture was implemented.
It’d be very helpful if you could tell any other architectures that you think would be a good idea to implement.

smit-s · February 25, 2020, 6:54pm

Hi,
I am Smit Sanghavi an undergrad with keen interest in NLP and Deep Learning. I also have some experience working on LSTM and Iam fluent with keras. Ialso have some experience in tensorflow as well.
I am looking forward to contribute by taking up this project.
Already been through the papers and the previous year project.
Now, trying to find ways to implement multilingual neural model for text generation.

diogenesis · February 26, 2020, 6:07pm

Hi,
I’m Anjali, a final year undergrad student with past research experience in NLP and computational linguistics. I would be interested in this project, and look forward to contribute to DBPedia through this.

nikhit · March 3, 2020, 4:46pm

Hi all,
Myself Nikhit. I am a third year undergrad majoring in CS from Andhra University, India. I have some experience in Deep Learning and NLP. I found this project quite challenging and would like to contribute to it this summer. Currently I am going through the warm-up tasks and any other tips/suggestions would be appreciated

diegomoussallem · March 6, 2020, 12:39pm

Hi,
we have other kinds of Graph-based NNs that you can propose, or you can propose to extend the previous project. It is up to you.

Here you can find some recent work for inspiring you:

diegomoussallem · March 6, 2020, 1:58pm

Please have a look at these recent works. It might help you

msobrevillac · March 18, 2020, 8:48am

Hi,

my name is Marco Sobrevilla, a 3rd year Phd Student at the University of São Paulo, Brazil. My research topic is Natural Language Generation from semantic representations for Brazilian Portuguese. I think this project is so interesting and challenging and I would like to contribute to it. I read the papers of the warm-up tasks and I am checking the repos now.

lesslyrics · March 18, 2020, 11:31am

Hello! My name is Alina and I am a 3rd year student in Saint-Petersburg State University, Russia. I study Applied mathematics and computer science in the department of Statistical modelling and have experience of research work in the area of Deep Learning. Also currently I am writing my term paper in the area of NLP.
I would like to contribute to this project as it seems highly interesting and challenging, so currently I am going through the warm-up tasks and starting preparing my proposal. I am looking forward to getting in touch with mentors. Is it possible to send proposal drafts directly to get feedback and improve it before the final version?
Thank you