RDF-to-text using Generative Adversarial Networks - GSoC2020

diegomoussallem · March 11, 2020, 9:53am

Description

Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language from non-linguistic data. Recently, the research field in this task has made significant progress in terms of data with the creation of benchmarks such as the E2E dataset [6], ROTOWIRE [8] and the WebNLG corpus [4, 5]; and models with the development of several approaches based on traditional pipeline architectures [1] and novel end-to-end, fully differentiable ones [2, 7]. However, progress has not been so solid in terms of evaluation, which perhaps is the biggest bottleneck of the field nowadays. Although they have been frequently used, there are empirical evidence and a consensus between scholars that automatic evaluation metrics, such as BLEU and METEOR, do not always correlate with human ratings. In contrast, human evaluation of data-to-text models’ outputs is more accurate, although expensive. The human assessment is usually performed correctly only in shared-tasks, which generally have the resources to do it. To combine the advantages of automatic (e.g., low investment) and human evaluations (e.g., accuracy), some studies like [3] propose a data-driven quality-estimation method, where a neural network is developed to score the quality of an automatic-generated text automatically. The neural network is trained based on public ratings of previous human evaluations. But, despite the fact that such quality-estimation methods combine the accuracy of human evaluations and the low cost of the automatic ones, how can this estimation be used to improve the quality of the data-to-text approach? This project aims to answer this question by developing generative adversarial networks (GANs) to convert a set of RDF triples into English text, where the quality estimator is used to train an RDF-to-text neural network in an adversarial and fully-differentiable way.

Goals:

To develop an RDF-to-text approach using Generative Neural Networks, which can accurately and efficiently estimate the quality of the textual outputs and backpropagate it through the model.

Impact:

To provide to the community an approach to automatically generate short summaries about entities that do not have a human abstract using triples and to solve the evaluation bottleneck in the research area of data-to-text natural language generation.

Warm-up tasks:

Read the papers:

DBpedia-to-text:
Deep Graph Convolutional Encoders for Structured Data to Text Generation

Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

Neural data-to-text generation:A comparison between pipeline and end-to-end architectures

Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation

Other important works:
Text Generation from Knowledge Graphs with Graph Transformers

Data-to-Text Generation with Content Selection and Planning

Mentors

Thiago Castro Ferreira, Diego Moussallem and Mariana Rachel Dias da Silva

Keywords

Neural Networks, Natural Language Generation, Knowledge Graphs

References:

Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, and Emiel Krahmer. Neural data-to-text generation: A comparison between pipeline and end-toend architectures. In Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 552–562, Hong Kong, China,
November 2019. Association for Computational Linguistics.
Marco Damonte and Shay B. Cohen. Structural neural encoders for AMR-to-text
generation. In Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long and Short Papers), pages 3649–3658, Minneapolis, Minnesota, June
Association for Computational Linguistics.
Ondrej Dusek, Jekaterina Novikova, and Verena Rieser. Referenceless quality estimation for natural language generation. CoRR, abs/1708.01759, 2017.
Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini.
Creating training corpora for NLG micro-planners. In Proceedings of ACL-2017,
Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini.
The WebNLG challenge: Generating text from RDF data. In Proceedings of the
10th International Conference on Natural Language Generation, INLG’17, pages
124–133, Santiago de Compostela, Spain, 2017. Association for Computational Linguistics.
Jekaterina Novikova, Ondrej Dusek, and Verena Rieser. The E2E dataset: New
challenges for end-to-end generation. In Proceedings of the 18th Annual SIGdial
Meeting on Discourse and Dialogue, pages 201–206, Saarbr¨ucken, Germany, 2017.
Leonardo F. R. Ribeiro, Yue Zhang, Claire Gardent, and Iryna Gurevych. Modeling
global and local node contexts for text generation from knowledge graphs, 2020.
Sam Wiseman, Stuart Shieber, and Alexander Rush. Challenges in data-todocument generation. In Proceedings of the 2017 Conference on Empirical Methods
in Natural Language Processing, pages 2253–2263, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.

aksh · March 14, 2020, 10:38am

Hey,

My name is Aksh Thakkar and i am currently in my 3rd year of Bachelors in Information Technology. I have some experience working on Generative Adversarial Networks and find it quite interesting. I would love to work on this project for GSOC 2020 and maybe even further.

Thank you

niloypurkait · March 14, 2020, 11:31am

This sounds like a very interesting project, would love to participate in it for the GSOC 2020 initiative. I am a Masters student in Cognitive Sciences and Artificial Intelligence, at Tilburg University. Have a keen interest in GANs and have worked with them in several projects before. Looking forward to writing a proposal and getting in touch with the mentors.

Best regards,

Niloy Purkait

maheshkulkarni · March 18, 2020, 9:03am

Hello, My name is Mahesh Kulkarni.Currently I am pursuing my B.Tech degree from Vishwakarma Institute Of Technology, Pune , India. I have some prior experience with NLP. I am finding interest in this project. I want to contribute to it. I am going through warm up tasks. [Deep Graph Convolutional Encoders for Structured Data to Text Generation]this paper described about GCN i understood how GCN but didn’t get proper material to learn more about GCN. Any further helpful instructions so that I will get more clarifications about the project?
Thank you

lesslyrics · March 18, 2020, 11:38am

Hello! My name is Alina and I am a 3rd year student in Saint-Petersburg State University, Russia. I study Applied mathematics and computer science in the department of Statistical modelling. I have experience of research work in the area of Deep Learning and currently I am writing my term paper in the area of NLP. Also I am familiar with GANs as I did several side projects that included them.
I would like to contribute to this project as it seems highly interesting and challenging. Currently I am going through the warm-up tasks and starting preparing my proposal. I am looking forward to getting in touch with mentors.
Thank you

1byxero · March 19, 2020, 6:59am

Hi,

I am graduate student at Indiana University, pursuing computer science with specialization in AI, Machine Learning and Deep Learning. I work as a research assistant at IU and my research focuses on domain of graph neural networks. I have good deep learning background and have done some projects! I also have been learning computer vision(deep learning as well as the traditional techniques) this semester. I am working on 3d semantic segmentation problem currently for a computer vision course. I also have worked with generative adversarial networks and that would come in handy I guess
I am looking forward to contribute to this project. I think I will be heavily able to contribute given my experience in requirements of this topics. Moreover I plan to extend this for my thesis.

I hope I am not late to this party!
Everyone stay safe during these times of pandemic!

manishpjt · March 22, 2020, 4:42pm

Greetings,

My name is Manish Prajapati, a 3rd student of BTech Computer Science at Lovely Professional University. I am having a lot of experience with NLP and GANs. As you can see on my Github account https://github.com/manish29071998.
I’ve compiled GAN(StyleGAN) and NLP(GPT2) to make a Social Media bot. It is present on Instagram with username @deepdaisygirl.
And as the project required knowledge of both the domain, I think I’m the best fit for it.

Thanks and regards.

nikhit · March 24, 2020, 6:18am

Hello,
Myself Nikhit, junior majoring in CS from Andhra University. I have experience working with NLP through projects. I know how GAN’s work but haven’t implemented them. I found this project quite challenging and interesting. I will go through the warm up tasks ASAP and will come up with a proposal
Best Regards,
Nikhit

aksh · April 11, 2020, 6:36pm

Hey @diegomoussallem,

I had submitted a proposal on this topic. I know it was at the last moment but i wanted to get feedback on it if you don’t mind. I am very interested in this topic particularly and would like to know how i did.

Thank you

siddhantjain07 · February 7, 2021, 12:52pm

Hello @diegomoussallem.
By any scope is there any chance we could have this project in '21 as well. As in expecting improvements and additional features for the current project. I am very much passionate about the field of NLP and GANs in particular. Hoping for a positive response.

Thank you.
Siddhant Jain