Neural Translation and Enrichment of Knowledge Graphs - GSoC2020


Knowledge Graphs are used in an increasing number of applications. Although considerable human effort has been invested into making knowledge graphs available in multiple languages, most knowledge graphs are in English. Additionally, regional facts are often only available in the language of the corresponding region. This lack of multilingual knowledge availability clearly limits the porting of machine learning models to different languages. To alleviate this drawback, we previously proposed THOTH, which is an approach for translating and enriching knowledge graphs across languages. THOTH extracts bilingual alignments between a source and target knowledge graph and learns how to translate from one to the other by relying on two different recurrent neural network models along with
knowledge graph embeddings. We evaluated THOTH extrinsically by comparing the German DBpedia with the German translation of the English DBpedia on two tasks: fact-checking and entity linking. In addition, we ran a manual intrinsic evaluation of the translation. Our results showed that THOTH is a promising approach that achieves a translation accuracy of 88.56%. Moreover, its enrichment improves the quality of the German DBpedia significantly, as we report +18.4% accuracy for fact validation and +19% F1 for entity linking.


In this GSoC, our idea is not to enrich officially the DBpedia KG rather investigate THOTH based on other Neural Network architectures along with distinct Knowledge Graphs Embeddings techniques for improving other downstream NLP tasks such as Machine Translation and Question Answering.

The project may allow users to enrich artificially low-resource DBpedia KGs to be used in essential NLP tasks or/and augment Knowledge graph-based Machine Learning models.

Warm-up tasks:

  • Read the papers:


Transformer :

Survey on Knowledge Graphs Embeddings:


Diego Moussallem


Neural Networks, NLP, Semantic Web

1 Like


I am interested in this project idea for GSoC 2020. As in what kind of code level contributions can I make, for this project ?

1 Like

Hi, the idea is to have a framework containing different NN algorithms to perform this enrichment. So you can simply rely on tensorflow libraries to build it as well as reuse some implementations from OpeNMT or whatever framework you prefer. Did I answer you question? It was a bit vague for me.

Yes, you have answered my question.

I am Ashutosh Sahu, a CSE student of IIT Bhilai, India.
I would love to contribute to this project for betterment of both the community and gaining knowledge in different fields. I hope its not too late for joining it!

However, I have a question in mind. I will be starting the warm up tasks pretty soon. I would like to ask that where do I post my progress and proof of completion? Should I post it here or there is some separate place to monitor progress?

You do not need to post anything here regarding your progress. Please have a look at You need to start working on your proposal and submit it until March 31st. You can communicate with me via email in case you have some doubts.


I’m Ankit Kumar, a junior undergrad in IIT Delhi, India.

I have worked significantly on Knowledge Graphs at my university, and would love to be a part of this project. I have already read the mentioned papers. Is there anything else I need to do?


My name is Shunsuke Kando. I’ll be a master student from this April in Computer Science at the University of Tokyo. I’m planning to research the knowledge graph now.
I have an experience of implementing OpenIE system called ReNoun, so that I have a basic knowledge about this area. Also, in my senior thesis, I validated the linguistic capacity of BERT with PyTorch, and I’m accustomed to using frameworks for deep learning.

I’d like to contribute to this topic using above mentioned skills. There are a few things I want to confirm for that:

  1. If I grasped the Goals correctly, I think this topic is something like research rather than contributing to OSS. Is it right?
  2. I’d like to confirm the content of the task concretely. Is it to apply some other neural network techniques such as Transformer or some knowledge graph embedding techniques to THOTH and evaluate it?
  3. I’m curious about how to apply the knowledge graph to Question Answering task. I know several benchmarks for QA such as SQuAD, but I only know that it can be solved by directly using some neural network based models. Is there some suitable papers to read? Since I think it’s a good idea to evaluate the quality of knowledge graph by solving the down streaming tasks, this could be the essential information.

I’m so sorry for the late contact, but I’m serious. I’m looking forward to getting reply.

Thanks and regards,
Shunsuke Kando

Hi @kando.s,

Quite interesting your background, I am looking forward to seeing your proposal. Now answering your questions,

  1. Everything in GSoC is related to research in IMHO.
  2. Correct and of course make the code available.
  3. I very much like this paper -> Please have a look. Also see this paper,this is how we are going to evaluate our system.

no. Just write the proposal

Thanks @diegomoussallem for the response. If possible, can you please share your gmail id with me? Just wanted feedback on my proposal in case it isn’t too much trouble.

Thank you for detailed answer. I’ll get down to my proposal considering it!

Hi @diegomoussallem. Sorry for the late submission.
I have shared my draft proposal. Could you please check once, so that I can go ahead and submit it?

Hi, I am sorry it is not possible. You can create a google doc and share with me via private msg.

Hi I didn’t receive anything, where did you share it with me?

I found it.