Extending Extraction Framework with Citations, Commons and Lexeme Extractors - GSoC2020


DBpedia is a crowd-sourced community effort to extract structured content from the various Wikimedia projects which is publicly available for everyone on the Web. This project will improve the DBpedia extraction (https://github.com/dbpedia/extraction-framework) process which is continuously being developed by community with citations, commons and lexemes information.


Student will develop the required modules which will parse the information from the specific source. Developed modules will be used to extract wider range of knowledge from the Wikimedia which will be presented openly to the community usage with different interest and language edition.


Created triples for the specific type of knowledge will be published to the community usage.

Warm up tasks

Preliminary experience with Extraction Framework




Extraction framework, text parsing, RDF generation

Would be willing to join in as co-mentor