Find 10 or 100 errors in DBpedia

Effort

1 day

Skills

curiosity, attention to detail, spreadsheets

Description

There are several classes of errors in DBpedia. Data may be incorrect or missing. Errors may be caused by different reasons, for instance 1) wrong information or wrong format in Wikipedia or other original source; 2) the DBpedia Extraction Framework (DEF) might be making errors during automatic extraction; 3) there might be errors in the mappings or in the ontology. In this task you will browse through DBpedia entities, read Wikipedia pages, (optionally) run some SPARQL queries via the Web UI and analyze the results that come back. Your objective is to judge whether information is correct and try to detect the possible sources of error. You will log your findings in a spreadsheet that will be reviewed with one of the core developers of DBpedia. They will review your analysis and help you determine the source of error.

Impact

Data quality is one of the most important challenges in open data sets like DBpedia. By finding and categorizing errors, you will learn more about how DBpedia works and help us draft a plan of action that will efficiently improve our data quality by tackling the largest sources of errors first.

The SPARQL endpoint URL is http://dbpedia.org/sparql
This task is related to our continous integration testing. If you have written the SPARQL query, you can go the extra mile and rewrite it in SHACL and push it here: https://github.com/dbpedia/extraction-framework/blob/master/dump/src/test/resources/custom-shacl-tests.ttl

There is already one for geo-coordinates.