Databus - query datasets of Wikidata

jan.zhouf · April 2, 2020, 5:30pm

Hi,

I started using the Databus. However, I have trouble understanding how to query the data in the Databus.
The data, I want to query through SPARQL endpoint https://databus.dbpedia.org/repo/sparql, are located here: https://databus.dbpedia.org/dbpedia/wikidata/mappingbased-properties-reified/.

Can you give me example of a query counting all resources in this dataset?

Thank you.

kurzum · April 4, 2020, 3:40pm

Hi Jan,
the goal of the databus is to produce a replication/deployment infrastructure. At the moment, the self-deployment is implemented already (but not overly documented). There are two ways to set up your own sparql endpoint:

using the query and the databus client: https://github.com/dbpedia/databus-client#docker-example-deploy-a-small-dataset-to-docker-sparql-endpoint
You can log in and create a collection and use this https://github.com/dbpedia/Dockerized-DBpedia

I can understand your request, that you first want to understand the data better before loading it. At the moment, we have a preview (on the page you can fold open the > to see the first 10 lines). There is also a property called dataid:nonEmptyLines "140614"^^xsd:decimal ; but it is still broken, i.e. the dataset has almost 4GB and therefore probably more than 140k lines.

We are currently implementing a triple store that keeps an analysis of all files on the bus, including VOID (https://www.w3.org/TR/void/). VOID has void:distinctSubjects which is what you are looking for. This will need 2-4 weeks (maybe more) to be effective.

Otherwise you can do:

curl $downloadURL | lbzip2 -dc | cut -f1 -d '>' | sort -u --parallel=8 | wc -l

to get a count of all distinct subects.