Virtuoso with latest DBpedia Dumps / missing data

Hello guys,
I have build the last DBPedia models (EN and IT) for Spotlight as you can read here: Consolidate Update Interval of DBpedia Spotlight

Now I have some entities that is not recognised by the online DBPedia such as: http://dbpedia.org/page/Presidency_of_Donald_Trump

So the first question is: what is the last update for the online DBPedia?

Now I want to create my Virtuoso installation from these models, because I need to run sparql query on the new entities, but I don’t understand which kind of files are needed to feed virtuoso and run SparQL query.

Can anybody help me?

1 Like

Hi @klaus82 w.r.t.
Q1: 2016-10 is loaded in public endpoint at the moment. In DBpedia Live you have recent data see. e.g. http://live.dbpedia.org/page/Presidency_of_Donald_Trump
Q2: You can use [1] to setup you own endpoint and use [2] as collection.

[1] https://github.com/dbpedia/Dockerized-DBpedia
[2] https://databus.dbpedia.org/dbpedia/collections/pre-release-2019-08-30/

Thanks @jfrey
it seems that databus-download-min is removed from docker hub, so the docker-compose doesn’t run.
I fixed the docker-compose and make a pull request [1].
The databus-download-min is moved to [2] as the link in the github project page [3]

[1] https://github.com/dbpedia/Dockerized-DBpedia/pull/14
[2] https://hub.docker.com/r/dbpedia/minimal-download-client
[3] https://github.com/dbpedia/minimal-download-client#docker-image

Ok thank you. Just for the record: databus-download-min is the local image name which requires that the image is built locally before. But there is only a hint for this step in the docu, not very user-friendly ;-). I think it would actually make sense to fetch both images from dockerhub and use an override compose file to build the images locally in case it is needed. That is why I did not accept your PR so far because we need to discuss that. But thank you for reporting the issue.

Hello @jfrey, one more question: the live.dbpedia.org is made with the pre-release-2019-08-30 or with the old one?

Thanks

live is live :notes:
live is updated with a delay of some hours from Wikipedia via https://www.mediawiki.org/wiki/API:Recent_changes_stream

Hello @jfrey I tried your suggestion, but I can’t have the same result of live.dbpedia.org.
This is my example query:

SELECT ?uri ?description
WHERE {?uri dbo:abstract ?description .
FILTER (?uri = <http://dbpedia.org/resource/IBM>) }

If I run the virtuoso endpoint whit the docker-compose I don’t have any result, if I run this query on live.dbpedia.org I have the result I want.
Why? Can you help me with this?

Thank you in advance

Can you please run the following query?

Select * where {
<http://dbpedia.org/resource/IBM> ?p ?o.}

I had a look and the English mapping files from 2019.09.01 do not contain a single triple for IBM. So if the query from above returns some triples you did not do anything wrong. I guess this is cause by something I wrote here Local vs online dbpedia versions: different results are returned

The query return this:

Alright. I am a bit surprised about this one rdf:type dbo:Company triple but apart from that there seems nothing wrong on your side. You can only wait until a new release is performed or combine 2019.09.01 with the 2016-10 release or use DBpedia Live mirroring[1], @kurzum what what is the status about 2016-10 release on Databus and does Live Mirroring still work out of the box after redeploy of live?

[1] http://dev.dbpedia.org/DBpedia_Live_Mirroring

How I can combine the 2019.09.01 and 2016-10 release in the docker-compose?

You could remove the download-min container from the compose and put the files into the target folder of the loader container manually

Hello @janfo in this way I can have duplicates? How these duplicates are handled by Virtuoso?

@klaus82 seems like you received some very complicated answers, although you have one of the most basic use cases, i.e. create a mirror of DBpedia.

Let me answer this:

  1. setting up a DBpedia Live Mirror is extremely complicated. Do not try that. @jfrey please stop suggesting this to people, unless they specifically ask about it. It is extremely hard and time-consuming.
  2. much easier is the ad-hoc extraction web service. It hosts the extraction framework as a web service and each request there, does a request to Wikipedia, see e.g. here: http://dbpedia.informatik.uni-leipzig.de:9998/server/extraction/en/extract?title=Angela+Merkel&revid=&format=trix&extractors=custom @jfrey could you take the task of adding a correct manual on how to set this up on github and the include it into dev.dbpedia.org
  3. [RECOMMENDED] @janfo could you hurry your two tasks and try to finish them? The first was to refactor the pre-release collection into a latest collection. The second is to make the Dockerized DBpedia run with collections smoothly. This is actually what @klaus82 wants and it coincides with what we are doing currently.
  4. @klaus82 If you don’t like the collections we create, you can always a) create a databus account, b) create your own collection or copy an existing and modify it. Note that they are all loaded into the same graph per default. So any duplicate triples should disappear.

We have this collection now, which should be the data you need.
https://databus.dbpedia.org/dbpedia/collections/latest-core

Please use it with this dockerized dbpedia container:

The query of the collection will fetch the latest release, we are working on static collections (selecting a specific version) for the public endpoint.

As far as I understood, the problem is that the releases used in the collection so far miss for @klaus82 important data. The question now is which one of the solutions fits best for his use case. @klaus82 please let us know what you actually want to do. If you only like to retrieve data per entity and no “analytical” query then I could write how to perform solution 2 (ad-hoc extraction). I doubt that the collection @janfo posted will bring you the data you are interested in since it is still using an old release at the moment (2019-09-01)

Hello @jfrey, what I need is:

  • redirects
  • wikipedia page id
  • description
  • thumbnail

for english and italian.

With the live.dbpedia I can get these informations but only for english, in dbpedia I can get a lot of lenguages among which Italian.