Web app to generate RDF from DBpedia abstracts - GSoC2021

Description

With the recent advances in the analysis of texts in natural language (e.g. SyntaxNet, spaCy), the conversion of texts into RDF triples is becoming a real possibility. This project will apply these ideas to a real use case: DBpedia. We will add the power of syntactic analyzers with the benefits of Name Entity identifiers (like Spotlight) to generate highly trustable RDF triples from the textual information (long abstract) about a given DBpedia resource.

Goals

To create an online tool aimed at generating a new nt file with the triples proposed for all the DBpedia resources. This tools could be exploited by the DBpedia extraction process to provide a new nt file in the DBpedia downloads.

Impact

Increase the number of RDF triples for a given DBpedia resource.

Warm up tasks

Experience with SyntaxNet o any other NLP tool capable of providing a syntactic analyzer of natural language. Here we have to reach a balance between power and number of supported languages.
Fluent RDF and DBpedia datasets (downloads).
In GsoC 2019 we proposed this task, but we could not finish the online tool. The software resources generated are available here but it is not mandatory to use them.
In GsoC 2020 we started this task again, but after a few weeks of work the person selected quited :frowning: . Hopefully 2021 will be the year for this project :slight_smile:

Mentors

Mariano Rico

Keywords

NLP, text parsing, syntactic analysis, RDF generation

1 Like

Hey! Iā€™m Sneh, and Iā€™m a sophomore at Vanderbilt University studying CS + Econ. This project seems like a good learning opportunity and Iā€™m hoping you can point me in the right direction in terms of starting to contribute in any way. I have some experience in creating web apps using flask but am always looking to learn new technical skills!

@mariano_rico I found this project quite interesting and starting warm up tasks ,mean while can you told me more about nt file.

Hi Sneh,

thanks for you interest. I think that a 2nd course level is too low for this task. We are using advances techniques in AI and the technicals skills required for this task are high. Anyway, follow up the proposal links, have a look at these technologies and standards, and tell me later if you want to enroll in this adventure :slight_smile:

Best,

-Mariano

Hi dhruvkabariya,

nt files are one of the formats in which RDF data can be serialized. Their main benefit is that are quite ā€œreadableā€ by humans.

Do not hesitate to ask another question. See you soon!

Best,

-Mariano

Hi @mariano_rico, Iā€™m Mayank, a 3rd-year C.S.E. undergraduate at BITS Pilani. I saw this project and found it interesting. I did an internship last year in which we applied NER on a dataset scraped from an e-Commerce website with custom tags. I built a model using bidirectional LSTMā€™s using softmax activation and adam optimizer. This project seems a good opportunity for contributing to this community and further increase my knowledge and expertise.
I wanted to know that for the online tool in GSOC 2019, do you mean that all the tasks related to NLP and NER were finished and just the web implementation was left?