Databus - query datasets of Wikidata


I started using the Databus. However, I have trouble understanding how to query the data in the Databus.
The data, I want to query through SPARQL endpoint, are located here:

Can you give me example of a query counting all resources in this dataset?

Thank you.

Hi Jan,
the goal of the databus is to produce a replication/deployment infrastructure. At the moment, the self-deployment is implemented already (but not overly documented). There are two ways to set up your own sparql endpoint:

  1. using the query and the databus client:
  2. You can log in and create a collection and use this

I can understand your request, that you first want to understand the data better before loading it. At the moment, we have a preview (on the page you can fold open the > to see the first 10 lines). There is also a property called dataid:nonEmptyLines "140614"^^xsd:decimal ; but it is still broken, i.e. the dataset has almost 4GB and therefore probably more than 140k lines.

We are currently implementing a triple store that keeps an analysis of all files on the bus, including VOID ( VOID has void:distinctSubjects which is what you are looking for. This will need 2-4 weeks (maybe more) to be effective.

Otherwise you can do:

curl $downloadURL | lbzip2 -dc | cut -f1 -d '>' | sort -u --parallel=8 | wc -l 

to get a count of all distinct subects.