Tasks for Volunteers

kurzum · October 2, 2019, 10:44am

NOTE: we keep this list below to have a central point for tasks, which we already clearly identified and are quite relevant to the core infrastructure. Proposals for extensions are also welcome and can be initiated under DBpedia Project Proposals and later merged into this list.

List of ideas and tasks for volunteers to help improve DBpedia.

DBpedia Usability

Write some cool queries

and post them here: https://forum.dbpedia.org/c/support/query-dbpedia-sparql/16 and tag them with coolqueries

Migrating the documentation

related to GUI below. Originally, we kept all the comments in a dataid file: http://downloads.dbpedia.org/2016-10/core/2016-10_dataid_core.ttl Now we are using pom and markdown to keep the docu up to date: https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config/-/tree/master/databus-poms%2Fdbpedia
While we already spent a lot of effort migrating the documentation, there is still a lot of missing docu as can be seen here: https://databus.dbpedia.org/dbpedia/generic

Easier Download GUI

We had a Widget that worked on the dataid in section 3 Datasets. The old one loaded the json file and rendered it. It would help a lot of people to identify the datasets they need. We can now get the data for this widget from the sparql endpoint of the Databus, see the query here

DBpedia Spotlight

Small change in the Dockerized version of Spotlight

See DBpedia spotlight long input returns code 414 for details

(in progress, almost done) restart DBpedia Spotlight project

Moved to Consolidate Update Interval of DBpedia Spotlight
What we need is this:

Spotlight needs training data from Wikipedia. Wikipedia dumps are parsed and then a model is created:
these models should be created at least every three months for all Wikipedias (we can provide servers) and published on the databus
from the databus, we can modify the existing spotlight docker to autoload and deploy. This docker can then be deployed at all the chapters.

Languages with missing redirects/disambiguations/instance-type

The model-quickstarter tool could be used to create models of DBpedia Spotlight for an specific language. To produced a model it is needed the following artifacts: redirects, disambiguation and instance-type artifacts. However, some languages have missing one or more of these artifacts, i.e., Swedish (sv-SE), Turkish (tr_TR), Danish (da_DK). You can run this SPARQL query to list all languages with missing artifacts. It is needed to produce these missing artifacts to create a complete version of the spotlight model for the corresponding language.

DBpedia Extraction Framework

General debugging of the DBpedia Extraction Framework (Scala/Java)

If you go to extraction-framework and run mvn install you can check all thrown exceptions and try to fix them. For more detailed debugging, cd dump ; mvn test currently runs tests on generic, mappings, text extraction and you can disable two of them, if you want to focus on one, around line 122: https://github.com/dbpedia/extraction-framework/blob/master/dump/src/test/scala/org/dbpedia/extraction/dump/MinidumpTests.scala#L122 .

Issue tracker fixing and migrating tests to the framework

Issues in the CI-Test category should be migrated to tests on the minidump. Others need to be tagged properly or reviewed and closed. The process is badly documented at the moment. So this task is hard now and we need to simplify it with better docu.

kurzum · October 2, 2019, 10:45am

aashay225 · November 14, 2019, 6:44am

Hi! I would like to work on the DBpedia Spotlight task. Please point me to further details.

SandraPraetor · November 14, 2019, 10:09am

kurzum · November 15, 2019, 9:54am

@aashay225 thanks, I moved it here: Consolidate Update Interval of DBpedia Spotlight

karankharecha · November 15, 2019, 11:33am

@kurzum
I really wish to contribute in DBpedia’s project!
Currently my query is regarding a new feature, is it possible to discuss new feature for datasets that I wish to propose?
Apart from these tasks, is it possible to discuss the new feature?

Also, if not new feature then I wish to fix the exceptions (H1), but it seems like the link is broken, could you please guide me on this?

kurzum · November 15, 2019, 11:57am

@karankharecha sure new features are the best. By the way, the link to the dump is here: https://github.com/dbpedia/extraction-framework/tree/master/dump (I updated it above, thanks for reporting)

@SandraPraetor I added a short sentence above in the topic. Do you have an idea where we collect proposals? Is this the https://forum.dbpedia.org/c/general-forums/projects category? It should be more clear.

karankharecha · November 15, 2019, 1:15pm

Okay, I appreciate the response.
I have proposed two projects (below link) on which I would like to work on:
https://forum.dbpedia.org/c/general-forums/projects

karankharecha · November 17, 2019, 10:16am

For task H1:
I cloned this repository:
dbpedia/extraction-framework.git

Followed the steps that are mentioned here

There are total three applications named “Download”, “Server”, “Extraction”
But when I run any of three applications, I’m getting the error of Scala instance:

I’m not able to run the project, please guide me

Also, any thoughts on dashboard and recommendation system? The two projects that I mentioned in:
https://forum.dbpedia.org/c/general-forums/projects

SandraPraetor · November 18, 2019, 11:52am

We could have proposals in the the Feedback & suggestions category. In case it is a GSoC-proposal, it belongs to the Google Summer of Code program category, I guess https://forum.dbpedia.org/c/Jobs/google-summer-of-code-program.

hopver · November 18, 2019, 2:16pm

Hey,
we started to document our setup and the execution for the (mappings,generic and wikidata) extractions at https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config
There you can find a short overview and the instruction how we run it for now.

For now the process is encapsulated into some single scripts.
I will update the documentation over the next days, to improve the readability and understanding, of how to adapt and change the extraction to your own approaches (e.g own DIEF code)

Further the documentation about the “marvin-config” (our bot), the DIEF (and how to debug it) will be collected at http://dev.dbpedia.org/MARVIN_Release_Bot

kurzum · November 18, 2019, 4:15pm

Feedback and suggestions is too general. These are projects which enhance dbpedia. Projects seem appropriate.
@sandrapraetorthese can be medium to big like https://wiki.dbpedia.org/timbr

karankharecha · November 19, 2019, 1:06pm

@hopver
thank you for the response.
currently I wish to build the project first, completely. so that I can get familiar with the code base and start debugging to complete the task H1.

karankharecha · November 19, 2019, 1:09pm

@kurzum
is there any option like community call ?
one call in a week when all or most of the mentors are free to make a call (video meeting), where we can discuss further ideas, projects and their current status.
I believe it would speed up the process little bit and students/volunteers like me can start coding as early as possible.

hopver · November 19, 2019, 6:01pm

Okay, the wiki information is a bit out of date, therefore I will update this part too.

For now, I published a tight “DIEF Debugging” documentation at http://dev.dbpedia.org/Debugging_DIEF, maybe this helps, just let me know if you need more detail.

But I can mention, that if you have Intellij ultimate or the community version installed, you can simply just open the cloned repository as a project and it will tell you (nearly the top) if something is missing.

Then you should be able to execute e.g. https://github.com/dbpedia/extraction-framework/blob/master/dump/src/test/scala/org/dbpedia/extraction/dump/MinidumpTests.scala by right-clicking in the file and select run (green arrow). This only applies for classes with a test or main method.

Pls also consider http://dev.dbpedia.org/Debugging_DIEF#requirements.

kurzum · November 20, 2019, 2:42pm

@karankharecha we stopped having these since we did a lot of remodelling. Now would be a good time to resume them.
We could schedule something and then announce the date, so others can join in. Can you propose a time? Next Wednesday is bad, other times work well.

karankharecha · November 20, 2019, 8:11pm

@kurzum
I’m okay with the time (all 7 days of the week) whenever all the mentors are free.
Regarding proposing a time, I think every Friday night of the week would be appropriate for mentors, as Friday night doesn’t interrupt their routine or other work and also doesn’t spoil weekends.
Otherwise any time decided, keeping in mind the schedules of mentors, is totally fine.

So, yes we could schedule and announce a date.

shreelakshmi · January 21, 2020, 2:46pm

Hello,

I wish to contribute to DBpedia. Are the tasks mentioned here are still open? Let me know how can I proceed further in contributing to the project.

kurzum · March 22, 2020, 6:08pm

@shreelakshmi I updated the tasks, so you can see which ones are still there. Sorry for the late reply.

m1ci · September 23, 2020, 9:45am

Hi @shreelakshmi,

there is DBpedia hackathon currently ongoing, so you might be interested in joining it: https://wiki.dbpedia.org/events/dbpedia-autumn-hackathon-2020

In particular, there is a Improve DBpedia track might be interesting for you, see full description of the track: https://docs.google.com/document/d/1XeZ-R9tOq09W0hcQk34OUi2dfWVH50fjBjqD0sABSAQ/edit#

To join the improve dbpedia track, join the #improve-dbpedia slack channel. If you are not on DBpedia slack, then join via https://dbpedia-slack.herokuapp.com

Best,
Milan