DBpedia Dataset 2019-08-30 (Pre-Release)

The DBpedia release 2019-08-30 can now be found here:
https://databus.dbpedia.org/dbpedia/collections/release-2019-08-30
https://databus.dbpedia.org/dbpedia/collections/pre-release-2019-08-30
Update: added pre- to collection uri, so it won’t be mistaken for the release

How to retrieve the data
Tools for easier download and usage of collection data are in development. Until then please follow the following steps:

  • Retrieve the data query (Visit the collection page and click on Actions > Copy Query to Clipboard or run curl https://databus.dbpedia.org/dbpedia/collections/pre-release-2019-08-30 -H "accept: text/sparql")
  • Run the query against https://databus.dbpedia.org/repo/sparql to get the list of downloadable files (make sure to use a POST request, since the request length exceeds the maximum length of a GET request)

More extensive information on DBpedia Databus Collections and how to use them will follow in the next few days.

@janfo I tested it with

#retrieve sparql query from collection 
QUERY=`curl  -H "Accept: text/sparql"  "https://databus.dbpedia.org/dbpedia/collections/pre-release-2019-08-30"`
#retrieve downloadurls with sparql query
DOWNLOADURLS=`curl -X POST --data-urlencode query="$QUERY" --data-urlencode format="text/tab-separated-values"  "https://databus.dbpedia.org/repo/sparql"`
#remove double quotes " from downloadurls, because of wget scheme missing
DOWNLOADURLS=`echo $DOWNLOADURLS| sed 's/"//g'`
# download
for i in $DOWNLOADURLS ; do ; echo "Downloading" $i ; wget $i ; done

but it downloads too many files, i.e. https://downloads.dbpedia.org/repo/lts/generic/categories/2019.08.30/categories_lang=br_labels.ttl.bz2
but in the http://downloads.dbpedia.org/2016-10/core/ folder there is:

  • all files only in English
  • just the text group has several languages, but they are en_uris and we don’t produce them yet. We could have a transition artifact.

update Ah yes, and these are supposed the main endpoint releases for http://dbpedia.org/sparql

@janfo it would also be cool to get a DataId / DCAT catalog in turtle, when doing "Accept: text/turtle" in the curl on the collection. These could also be available at
https://databus.dbpedia.org/dbpedia/collections/pre-release-2019-08-30.ttl with a 303 redirect. Could you record that feature in the issue tracker? I think it is not a priority, but very cool to have. It is related to some of the issues here: https://github.com/dbpedia/databus-maven-plugin/issues DataId needs some changes, but we decided to focus on data output and usability first.

Created an issue here: https://github.com/dbpedia/databus-maven-plugin/issues/101

The link leading to this prerelease from the main DBpedia download page is broken - it is still in the old format.

First link from https://wiki.dbpedia.org/develop/datasets (https://databus.dbpedia.org/dbpedia/collections/release-2019-08-30) returns “Unable to find the collection.”

We are working on it. There seem to be a bug on the Website which prevents to make changes to this site.

best

Sandra