Description
With the recent advances in the analysis of texts in natural language (e.g. SyntaxNet, spaCy), the conversion of texts into RDF triples is becoming a real possibility. This project will apply these ideas to a real use case: DBpedia. We will add the power of syntactic analyzers with the benefits of Name Entity identifiers (like Spotlight) to generate highly trustable RDF triples from the textual information (long abstract) about a given DBpedia resource.
Goals
To create an online tool aimed at generating a new nt file with the triples proposed for all the DBpedia resources. This tools could be exploited by the DBpedia extraction process to provide a new nt file in the DBpedia downloads.
Impact
Increase the number of RDF triples for a given DBpedia resource.
Warm up tasks
Experience with SyntaxNet o any other NLP tool capable of providing a syntactic analyzer of natural language. Here we have to reach a balance between power and number of supported languages.
Fluent RDF and DBpedia datasets (downloads).
In GsoC 2019 we proposed this task, but we could not finish the online tool. The software resources generated are available here but it is not mandatory to use them.
In GsoC 2020 we started this task again, but after a few weeks of work the person selected quited . Hopefully 2021 will be the year for this project
Mentors
Mariano Rico
Keywords
NLP, text parsing, syntactic analysis, RDF generation