Description
Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language from non-linguistic data. Recently, the research field in this task has made significant progress in terms of data with the creation of benchmarks such as the E2E dataset [6], ROTOWIRE [8] and the WebNLG corpus [4, 5]; and models with the development of several approaches based on traditional pipeline architectures [1] and novel end-to-end, fully differentiable ones [2, 7]. However, progress has not been so solid in terms of evaluation, which perhaps is the biggest bottleneck of the field nowadays. Although they have been frequently used, there are empirical evidence and a consensus between scholars that automatic evaluation metrics, such as BLEU and METEOR, do not always correlate with human ratings. In contrast, human evaluation of data-to-text models’ outputs is more accurate, although expensive. The human assessment is usually performed correctly only in shared-tasks, which generally have the resources to do it. To combine the advantages of automatic (e.g., low investment) and human evaluations (e.g., accuracy), some studies like [3] propose a data-driven quality-estimation method, where a neural network is developed to score the quality of an automatic-generated text automatically. The neural network is trained based on public ratings of previous human evaluations. But, despite the fact that such quality-estimation methods combine the accuracy of human evaluations and the low cost of the automatic ones, how can this estimation be used to improve the quality of the data-to-text approach? This project aims to answer this question by developing generative adversarial networks (GANs) to convert a set of RDF triples into English text, where the quality estimator is used to train an RDF-to-text neural network in an adversarial and fully-differentiable way.
Goals:
To develop an RDF-to-text approach using Generative Neural Networks, which can accurately and efficiently estimate the quality of the textual outputs and backpropagate it through the model.
Impact:
To provide to the community an approach to automatically generate short summaries about entities that do not have a human abstract using triples and to solve the evaluation bottleneck in the research area of data-to-text natural language generation.
Warm-up tasks:
Read the papers:
DBpedia-to-text:
Deep Graph Convolutional Encoders for Structured Data to Text Generation
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Neural data-to-text generation:A comparison between pipeline and end-to-end architectures
Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation
Other important works:
Text Generation from Knowledge Graphs with Graph Transformers
Data-to-Text Generation with Content Selection and Planning
Mentors
Thiago Castro Ferreira, Diego Moussallem and Mariana Rachel Dias da Silva
Keywords
Neural Networks, Natural Language Generation, Knowledge Graphs
References:
- Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, and Emiel Krahmer. Neural data-to-text generation: A comparison between pipeline and end-toend architectures. In Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 552–562, Hong Kong, China,
November 2019. Association for Computational Linguistics. - Marco Damonte and Shay B. Cohen. Structural neural encoders for AMR-to-text
generation. In Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long and Short Papers), pages 3649–3658, Minneapolis, Minnesota, June - Association for Computational Linguistics.
- Ondrej Dusek, Jekaterina Novikova, and Verena Rieser. Referenceless quality estimation for natural language generation. CoRR, abs/1708.01759, 2017.
- Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini.
Creating training corpora for NLG micro-planners. In Proceedings of ACL-2017, - Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini.
The WebNLG challenge: Generating text from RDF data. In Proceedings of the
10th International Conference on Natural Language Generation, INLG’17, pages
124–133, Santiago de Compostela, Spain, 2017. Association for Computational Linguistics. - Jekaterina Novikova, Ondrej Dusek, and Verena Rieser. The E2E dataset: New
challenges for end-to-end generation. In Proceedings of the 18th Annual SIGdial
Meeting on Discourse and Dialogue, pages 201–206, Saarbr¨ucken, Germany, 2017. - Leonardo F. R. Ribeiro, Yue Zhang, Claire Gardent, and Iryna Gurevych. Modeling
global and local node contexts for text generation from knowledge graphs, 2020. - Sam Wiseman, Stuart Shieber, and Alexander Rush. Challenges in data-todocument generation. In Proceedings of the 2017 Conference on Empirical Methods
in Natural Language Processing, pages 2253–2263, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.