NOTE: we keep this list below to have a central point for tasks, which we already clearly identified and are quite relevant to the core infrastructure. Proposals for extensions are also welcome and can be initiated under DBpedia Project Proposals and later merged into this list.
List of ideas and tasks for volunteers to help improve DBpedia.
Write some cool queries
and post them here: https://forum.dbpedia.org/c/support/query-dbpedia-sparql/16 and tag them with
Migrating the documentation
related to GUI below. Originally, we kept all the comments in a dataid file: http://downloads.dbpedia.org/2016-10/core/2016-10_dataid_core.ttl Now we are using pom and markdown to keep the docu up to date: https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config/-/tree/master/databus-poms%2Fdbpedia
While we already spent a lot of effort migrating the documentation, there is still a lot of missing docu as can be seen here: https://databus.dbpedia.org/dbpedia/generic
Easier Download GUI
We had a Widget that worked on the dataid in section 3 Datasets. The old one loaded the json file and rendered it. It would help a lot of people to identify the datasets they need. We can now get the data for this widget from the sparql endpoint of the Databus, see the query here
Small change in the Dockerized version of Spotlight
See DBpedia spotlight long input returns code 414 for details
(in progress, almost done) restart DBpedia Spotlight project
Moved to Consolidate Update Interval of DBpedia Spotlight
What we need is this:
- Spotlight needs training data from Wikipedia. Wikipedia dumps are parsed and then a model is created:
- these models should be created at least every three months for all Wikipedias (we can provide servers) and published on the databus
- from the databus, we can modify the existing spotlight docker to autoload and deploy. This docker can then be deployed at all the chapters.
Languages with missing redirects/disambiguations/instance-type
The model-quickstarter tool could be used to create models of DBpedia Spotlight for an specific language. To produced a model it is needed the following artifacts: redirects, disambiguation and instance-type artifacts. However, some languages have missing one or more of these artifacts, i.e., Swedish (sv-SE), Turkish (tr_TR), Danish (da_DK). You can run this SPARQL query to list all languages with missing artifacts. It is needed to produce these missing artifacts to create a complete version of the spotlight model for the corresponding language.
DBpedia Extraction Framework
General debugging of the DBpedia Extraction Framework (Scala/Java)
If you go to extraction-framework and run
mvn install you can check all thrown exceptions and try to fix them. For more detailed debugging,
cd dump ; mvn test currently runs tests on
text extraction and you can disable two of them, if you want to focus on one, around line 122: https://github.com/dbpedia/extraction-framework/blob/master/dump/src/test/scala/org/dbpedia/extraction/dump/MinidumpTests.scala#L122 .
Issue tracker fixing and migrating tests to the framework
Issues in the CI-Test category should be migrated to tests on the minidump. Others need to be tagged properly or reviewed and closed. The process is badly documented at the moment. So this task is hard now and we need to simplify it with better docu.