Online Tool to generate RDF from DBpedia abstracts - GSoC2020

mariano_rico · March 13, 2020, 4:28pm

Description

With the recent advances (e.g. SyntaxNet) in the analysis of texts in natural language, the conversion of texts into RDF triples is becoming a real possibility. This project will apply these ideas to a real use case: DBpedia. We will add the power of syntactic analyzers with the benefits of Name Entity identifiers (like Spotlight) to generate highly trustable RDF triples from the textual information (long abstract) about a given DBpedia resource.

Goals

To create an online tool aimed at generating a new nt file with the triples proposed for all the DBpedia resources. This tools could be exploited by the DBpedia extraction process to provide a new nt file in the DBpedia downloads.

Impact

Increase the number of RDF triples for a given DBpedia resource.

Warm up tasks

Experience with SyntaxNet o any other NLP tool capable of providing a syntactic analyzer of natural language. Here we have to reach a balance between power and number of supported languages.
Fluent RDF and DBpedia datasets (downloads).
In GsoC 2019 we proposed this task, but we could not finish the online tool. The software resources generated are available (https://github.com/sahitpj/GSoC-codebase)(here) but it is not mandatory to use them.

Mentors

Mariano Rico

Keywords

NLP, text parsing, syntactic analysis, RDF generation

pritideo · March 15, 2020, 3:13pm

Hello, My name is Priti Deo.Currently I am pursuing my B.Tech degree from Vishwakarma Institute Of Technology, Pune , India. I have some prior experience with NLP. I am finding interest in this project. I want to contribute to it. I am going through warm up tasks. Any further helpful instructions so that I will get more clarifications about the project?

adityamalte · March 15, 2020, 6:53pm

Hi Mariano,
I believe we’ve had a discussion regarding this project.
With recent advances in NLP and my experience in the same, I think this problem should be solvable in a reasonable amount of time.
One good thing is that DBpedia abstracts are a lot cleaner than other data sources that I have been challenged with.
Do let me know your thoughts.
Thanks
Aditya

mariano_rico · March 16, 2020, 8:58am

Please, send me (mariano.rico@upm.es) you CV, focused on (1) your experience with open source projects, (2) NLP techniques and libraries used to extract structured information from text, and (3) experience on creating web applications.