Inference by DBPedia ontology

karlisc · December 1, 2020, 6:21pm

Hi there!
One more naive question. The class dbo:Actor is said to be the rdfs:range of dbo:starring in the DBPedia ontology (within the DBPedia dataset).
Still, the query
PREFIX : http://dbpedia.org/ontology/
SELECT ?Film WHERE{
?Film a :Film.
http://dbpedia.org/resource/Julia_Roberts a :Actor.
?Film :starring http://dbpedia.org/resource/Julia_Roberts.
}
does not return any result, while dropping the :Actor type assertion leads to results.
PREFIX : http://dbpedia.org/ontology/
SELECT ?Film WHERE{
?Film a :Film.
?Film :starring http://dbpedia.org/resource/Julia_Roberts.
}
The apparent reason for this difference is that http://dbpedia.org/resource/Julia_Roberts is not explicitly asserted to be of type dbo:Actor and the public DBPedia endpoint does not perform inference on the basis of the ontology rdfs:range assertions.
My series of related questions then is:

would it make sense (by the design of DBPedia) to build a DBPedia SPARQL endpoint that does perform the inferences specified in the DBPedia ontology?
if so, has somebody already looked at creating DBPedia SPARQL endpoint instance with the ontology inferences performed (either on the fly, or as materialization)?
or possibly the ontology is to be viewed in a “constraint” sense, and the fact of missing
http://dbpedia.org/resource/Julia_Roberts a :Actor. triple should rather be considered as incomplete data contents?

A naive (possibly somewhat extreme) view on the DBPedia data would be that it would rather not make much sense to offer its data without the ontology inferencing, if the inferencing is expected to be performed to achieve a semantically valid data model.

Thanks a lot for an explanation!
Kārlis

kurzum · December 7, 2020, 11:28am

Hi @karlisc,
Did you use OpenLink Virtuoso SPARQL Query Editor with data from 2016/17? Or the new endpoint: https://dbpedia.demo.openlinksw.com/sparql Note that we will switch soon to the new one.

Regarding the type information, please have a look here: OIDC Form_Post Response

There are two files loaded: the specific , i.e. as they are produced by DIEF and “transitive” containing all inferences. These are loaded. So there is a forward-chaining materialization. Technically, Virtuoso also supports backward-chaining reasoning executed for each query. But here we load it materialized as it is faster, i.e. no additional inference needs to be done per query. Reason being that almost every query wants this inference.

This should already be the case. If a type is not available, it is a problem with the mappings at mappings.dbpedia.org or with the DIEF extraction.

karlisc · December 8, 2020, 11:21am

Hi Sebastian (@kurzum),

thanks a lot for your reply and thanks for noting the new endpoint!

It seems though that the new endpoint does not have all the data that have been in the old one.

My queries showing that inference has not been done properly (as e.g. A or B below) have been done on the old endpoint. Unfortunately these queries run into server-side error due to estimated execution time being larger than 240 on the new endpoint.

A. select distinct ?a ?b (count(?x) as ?cx) where
{?a rdfs:subClassOf ?b. ?x a ?a. FILTER NOT EXISTS{?x a ?b}}
order by desc(?cx)
B. select ?c (count(?x) as ?cx) where {?x rdf:type/rdfs:subClassOf* ?c.
FILTER NOT EXISTS {?x rdf:type ?c}}
order by desc(?cx)

Is there anything that can be done with respect the DBPedia server constraints, or the best way forward would be creating a local installation?

Thanks in advance!

Kārlis

kurzum · December 9, 2020, 8:54am

@karlisc could you use ``` and ` to highlight code & queries?

Took me a while to remember, but I figured it out.

https://databus.dbpedia.org/dbpedia/mappings/instance-types/ is produced by the framework and properly expanded into a transitive closure
The czech chapter contributed additional types, generated from the abstracts. These increase coverage, but the transitive closure is not pre-computed.

CaptSolo · December 11, 2020, 11:38am

Hi @kurzum,

Thank you for pointing to the new DBpedia endpoint.
https://dbpedia.demo.openlinksw.com/sparql

What is the difference in terms of datasets loaded into the current DBpedia.org endpoint and the new endpoint?

E.g. the current endpoint has ~4 million gold:hypernym assertions while the new endpoint has none:

SELECT ?s ?hypernym
WHERE {
  ?s <http://purl.org/linguistics/gold/hypernym> ?hypernym .
}

kurzum · December 13, 2020, 11:52am

current contains static data from 2017.
new is loaded in regular intervals from OIDC Form_Post Response

Pro:

we know exactly what is loaded
community can maintain and add to the collection
more up to date, i.e. EN Wikipedia doubled since 2017

Cons:

not all datasets identified and loaded yet. Like the one with the hypernyms you mentioned. We need to track that down. It wasn’t on our list yet, see bottom of Home - DBpedia Association