Inference by DBPedia ontology

Hi there!
One more naive question. The class dbo:Actor is said to be the rdfs:range of dbo:starring in the DBPedia ontology (within the DBPedia dataset).
Still, the query
PREFIX : http://dbpedia.org/ontology/
SELECT ?Film WHERE{
?Film a :Film.
http://dbpedia.org/resource/Julia_Roberts a :Actor.
?Film :starring http://dbpedia.org/resource/Julia_Roberts.
}
does not return any result, while dropping the :Actor type assertion leads to results.
PREFIX : http://dbpedia.org/ontology/
SELECT ?Film WHERE{
?Film a :Film.
?Film :starring http://dbpedia.org/resource/Julia_Roberts.
}
The apparent reason for this difference is that http://dbpedia.org/resource/Julia_Roberts is not explicitly asserted to be of type dbo:Actor and the public DBPedia endpoint does not perform inference on the basis of the ontology rdfs:range assertions.
My series of related questions then is:

  • would it make sense (by the design of DBPedia) to build a DBPedia SPARQL endpoint that does perform the inferences specified in the DBPedia ontology?
  • if so, has somebody already looked at creating DBPedia SPARQL endpoint instance with the ontology inferences performed (either on the fly, or as materialization)?
  • or possibly the ontology is to be viewed in a “constraint” sense, and the fact of missing
    http://dbpedia.org/resource/Julia_Roberts a :Actor. triple should rather be considered as incomplete data contents?

A naive (possibly somewhat extreme) view on the DBPedia data would be that it would rather not make much sense to offer its data without the ontology inferencing, if the inferencing is expected to be performed to achieve a semantically valid data model.

Thanks a lot for an explanation!
Kārlis

1 Like

Hi @karlisc,
Did you use http://dbpedia.org/sparql with data from 2016/17? Or the new endpoint: https://dbpedia.demo.openlinksw.com/sparql Note that we will switch soon to the new one.

Regarding the type information, please have a look here: https://databus.dbpedia.org/dbpedia/mappings/instance-types/2020.10.01

There are two files loaded: the specific , i.e. as they are produced by DIEF and “transitive” containing all inferences. These are loaded. So there is a forward-chaining materialization. Technically, Virtuoso also supports backward-chaining reasoning executed for each query. But here we load it materialized as it is faster, i.e. no additional inference needs to be done per query. Reason being that almost every query wants this inference.

This should already be the case. If a type is not available, it is a problem with the mappings at mappings.dbpedia.org or with the DIEF extraction.

Hi Sebastian (@kurzum),

thanks a lot for your reply and thanks for noting the new endpoint!

It seems though that the new endpoint does not have all the data that have been in the old one.

My queries showing that inference has not been done properly (as e.g. A or B below) have been done on the old endpoint. Unfortunately these queries run into server-side error due to estimated execution time being larger than 240 on the new endpoint.

A. select distinct ?a ?b (count(?x) as ?cx) where
{?a rdfs:subClassOf ?b. ?x a ?a. FILTER NOT EXISTS{?x a ?b}}
order by desc(?cx)
B. select ?c (count(?x) as ?cx) where {?x rdf:type/rdfs:subClassOf* ?c.
FILTER NOT EXISTS {?x rdf:type ?c}}
order by desc(?cx)

Is there anything that can be done with respect the DBPedia server constraints, or the best way forward would be creating a local installation?

Thanks in advance!

Kārlis

@karlisc could you use ``` and ` to highlight code & queries?

Took me a while to remember, but I figured it out.

Hi @kurzum,

Thank you for pointing to the new DBpedia endpoint.
https://dbpedia.demo.openlinksw.com/sparql

What is the difference in terms of datasets loaded into the current DBpedia.org endpoint and the new endpoint?

E.g. the current endpoint has ~4 million gold:hypernym assertions while the new endpoint has none:

SELECT ?s ?hypernym
WHERE {
  ?s <http://purl.org/linguistics/gold/hypernym> ?hypernym .
} 

current contains static data from 2017.
new is loaded in regular intervals from https://databus.dbpedia.org/dbpedia/collections/latest-core

Pro:

  • we know exactly what is loaded
  • community can maintain and add to the collection
  • more up to date, i.e. EN Wikipedia doubled since 2017

Cons: