Virtuoso with latest DBpedia Dumps / missing data

Ok thank you. Just for the record: databus-download-min is the local image name which requires that the image is built locally before. But there is only a hint for this step in the docu, not very user-friendly ;-). I think it would actually make sense to fetch both images from dockerhub and use an override compose file to build the images locally in case it is needed. That is why I did not accept your PR so far because we need to discuss that. But thank you for reporting the issue.

Hello @jfrey, one more question: the live.dbpedia.org is made with the pre-release-2019-08-30 or with the old one?

Thanks

live is live :notes:
live is updated with a delay of some hours from Wikipedia via https://www.mediawiki.org/wiki/API:Recent_changes_stream

Hello @jfrey I tried your suggestion, but I can’t have the same result of live.dbpedia.org.
This is my example query:

SELECT ?uri ?description
WHERE {?uri dbo:abstract ?description .
FILTER (?uri = <http://dbpedia.org/resource/IBM>) }

If I run the virtuoso endpoint whit the docker-compose I don’t have any result, if I run this query on live.dbpedia.org I have the result I want.
Why? Can you help me with this?

Thank you in advance

Can you please run the following query?

Select * where {
<http://dbpedia.org/resource/IBM> ?p ?o.}

I had a look and the English mapping files from 2019.09.01 do not contain a single triple for IBM. So if the query from above returns some triples you did not do anything wrong. I guess this is cause by something I wrote here Local vs online dbpedia versions: different results are returned

The query return this:

Alright. I am a bit surprised about this one rdf:type dbo:Company triple but apart from that there seems nothing wrong on your side. You can only wait until a new release is performed or combine 2019.09.01 with the 2016-10 release or use DBpedia Live mirroring[1], @kurzum what what is the status about 2016-10 release on Databus and does Live Mirroring still work out of the box after redeploy of live?

[1] http://dev.dbpedia.org/DBpedia_Live_Mirroring

How I can combine the 2019.09.01 and 2016-10 release in the docker-compose?

You could remove the download-min container from the compose and put the files into the target folder of the loader container manually

Hello @janfo in this way I can have duplicates? How these duplicates are handled by Virtuoso?

@klaus82 seems like you received some very complicated answers, although you have one of the most basic use cases, i.e. create a mirror of DBpedia.

Let me answer this:

  1. setting up a DBpedia Live Mirror is extremely complicated. Do not try that. @jfrey please stop suggesting this to people, unless they specifically ask about it. It is extremely hard and time-consuming.
  2. much easier is the ad-hoc extraction web service. It hosts the extraction framework as a web service and each request there, does a request to Wikipedia, see e.g. here: http://dbpedia.informatik.uni-leipzig.de:9998/server/extraction/en/extract?title=Angela+Merkel&revid=&format=trix&extractors=custom @jfrey could you take the task of adding a correct manual on how to set this up on github and the include it into dev.dbpedia.org
  3. [RECOMMENDED] @janfo could you hurry your two tasks and try to finish them? The first was to refactor the pre-release collection into a latest collection. The second is to make the Dockerized DBpedia run with collections smoothly. This is actually what @klaus82 wants and it coincides with what we are doing currently.
  4. @klaus82 If you don’t like the collections we create, you can always a) create a databus account, b) create your own collection or copy an existing and modify it. Note that they are all loaded into the same graph per default. So any duplicate triples should disappear.

We have this collection now, which should be the data you need.
https://databus.dbpedia.org/dbpedia/collections/latest-core

Please use it with this dockerized dbpedia container:

The query of the collection will fetch the latest release, we are working on static collections (selecting a specific version) for the public endpoint.

As far as I understood, the problem is that the releases used in the collection so far miss for @klaus82 important data. The question now is which one of the solutions fits best for his use case. @klaus82 please let us know what you actually want to do. If you only like to retrieve data per entity and no “analytical” query then I could write how to perform solution 2 (ad-hoc extraction). I doubt that the collection @janfo posted will bring you the data you are interested in since it is still using an old release at the moment (2019-09-01)

Hello @jfrey, what I need is:

  • redirects
  • wikipedia page id
  • description
  • thumbnail

for english and italian.

With the live.dbpedia I can get these informations but only for english, in dbpedia I can get a lot of lenguages among which Italian.

@jfrey @klaus82 Jan is providing the recommended solution. Klaus has updated a local spotlight version and now he wants more information about found entities, such as Trump. This info is not in 2016, but it is in https://databus.dbpedia.org/dbpedia/collections/latest-core (2019-09).
The collection will update automatically, so re-deploying the DBpedia Docker brings you fresher data now and then.
@klaus82 we established latest-core for users to adapt:

  1. get a databus account and log in
  2. go to https://databus.dbpedia.org/dbpedia/collections/latest-core and click on Actions-> Copy Edit . This will create a collection in your space
  3. Add the Italian datasets you need.
  4. (Optional) remove some datasets, that you don’t need, so loading is faster
  5. Publish your collection and put the collection URL into https://github.com/dbpedia/Dockerized-DBpedia
  6. do your queries, either SPARQL or use the IP, e.g. 127.0.0.1:$port for linked data (might need some setup, so sparql is easier).

Thanks @kurzum for your reply.

I noticed that the databus endpoint used to download data for spotlight, as you suggested to me in the post [1]:

My goal is: Given a text, analysed by spotlight, I need to understand what are the entity, recognised by spotlight, that are changed between the 2016 release and this last release, but could be traced with an entity of 2016. Eg: http://dbpedia.org/resource/Adobe_Systems and http://dbpedia.org/resource/Adobe_Inc.
To do this I query Virtuoso and try to match the entities, but If the dataset downloaded for spotlight and for virtuoso are different I could have match errors or mismatch.

Do you have any suggestions to achieve my goal?

[1] Consolidate Update Interval of DBpedia Spotlight

I am suggesting, that you build this synchronization yourself.

  1. we are currently working on updating the pre-built tar.gz for spotlight on the bus. But you also know how to build it.
  2. Then you select matching versions from the DBpedia and load it in a local endpoint via https://github.com/dbpedia/Dockerized-DBpedia

dbpedia.org/sparql and the spotlight service might be synchronized in the future, but I don’t know when. Until then you need to build this yourself.