Neural Translation and Enrichment of Knowledge Graphs - GSoC2020


Knowledge Graphs are used in an increasing number of applications. Although considerable human effort has been invested into making knowledge graphs available in multiple languages, most knowledge graphs are in English. Additionally, regional facts are often only available in the language of the corresponding region. This lack of multilingual knowledge availability clearly limits the porting of machine learning models to different languages. To alleviate this drawback, we previously proposed THOTH, which is an approach for translating and enriching knowledge graphs across languages. THOTH extracts bilingual alignments between a source and target knowledge graph and learns how to translate from one to the other by relying on two different recurrent neural network models along with
knowledge graph embeddings. We evaluated THOTH extrinsically by comparing the German DBpedia with the German translation of the English DBpedia on two tasks: fact-checking and entity linking. In addition, we ran a manual intrinsic evaluation of the translation. Our results showed that THOTH is a promising approach that achieves a translation accuracy of 88.56%. Moreover, its enrichment improves the quality of the German DBpedia significantly, as we report +18.4% accuracy for fact validation and +19% F1 for entity linking.


In this GSoC, our idea is not to enrich officially the DBpedia KG rather investigate THOTH based on other Neural Network architectures along with distinct Knowledge Graphs Embeddings techniques for improving other downstream NLP tasks such as Machine Translation and Question Answering.

The project may allow users to enrich artificially low-resource DBpedia KGs to be used in essential NLP tasks or/and augment Knowledge graph-based Machine Learning models.

Warm-up tasks:

  • Read the papers:


Transformer :

Survey on Knowledge Graphs Embeddings:


Diego Moussallem


Neural Networks, NLP, Semantic Web


I am interested in this project idea for GSoC 2020. As in what kind of code level contributions can I make, for this project ?

Hi, the idea is to have a framework containing different NN algorithms to perform this enrichment. So you can simply rely on tensorflow libraries to build it as well as reuse some implementations from OpeNMT or whatever framework you prefer. Did I answer you question? It was a bit vague for me.

Yes, you have answered my question.