Number of types different - SPARQL vs. instance-types_lang=en_transitive.ttl.bz2

m1ci · January 20, 2021, 8:50am

From https://twitter.com/datao/status/1351807141196288002

The sparql endpoint of

returns a lot of types for Victor Hugo (cf url below), whereas a grep in “instance-types_lang=en_transitive.ttl.bz2” only returns 5 (cf screenshot). Which file from DBpedia contains the rest of the types of Victor Hugo?

datao · January 20, 2021, 9:03am

The URL mentionned in the tweet is simple query to return the types of resource dbpedia:Victor_Hugo.
https://yasgui.triply.cc/#query=select%20%3Ft%20where%20{graph%20<http%3A%2F%2Fdbpedia.org>%20{%20<http%3A%2F%2Fdbpedia.org%2Fresource%2FVictor_Hugo>%20rdf%3Atype%20%3Ft}%20}&endpoint=http%3A%2F%2Fdbpedia.org%2Fsparql&requestMethod=GET&tabTitle=Victor%20Hugo%20%40%20DBPedia.org&headers={}&contentTypeConstruct=application%2Fn-triples%2C*%2F*%3Bq%3D0.9&contentTypeSelect=application%2Fsparql-results%2Bjson%2C*%2F*%3Bq%3D0.9&outputFormat=table&outputSettings={"pageSize"%3A-1}

m1ci · January 20, 2021, 9:20am

Hi @datao,

In the SPARQL endpoint is loaded a collection of data artifacts (datasets) which are listed here:
https://databus.dbpedia.org/dbpedia/collections/latest-core
Types statements provide the following datasets:

Intance types transitive (you are looking at): https://downloads.dbpedia.org/repo/dbpedia/mappings/instance-types/2020.12.01/instance-types_lang=en_transitive.ttl.bz2
YAGO types (you have not considered): https://vmdbpedia.informatik.uni-leipzig.de/repo/vehnem/yago/instance-types/2016.10.01/instance-types_tag=specific.ttl.bz2
SDTypes types (you have not considered): https://downloads.dbpedia.org/repo/dbpedia/transition/sdtypes/2016.10.01/sdtypes_lang=en.ttl.bz2

I grep for http://dbpedia.org/resource/Victor_Hugo in these 3 datasets and in total I get 99 statements, which is equal to what you get via the SPARQL endpoint.

In summary, the SPARQL endpoint loads a specific collection of datasets (not all) that are extracted by DBpedia extraction framework or hosted at the DBpedia Databus platform.

Hope this answers your question.

datao · January 22, 2021, 8:40am

It does. Thanks you very much.

I have had a look inside the Yago and SDTypes directories.
The files have pretty recent timestamps.
Why not change the directory names to reflect that freshness?
(2016.10.01/ is misleading)

datao · January 22, 2021, 8:58am

A nice-to-have proposal:
Currently I need to update a live instance of DBPedia from 9 months ago, and I cannot retrieve the corresponding Yago and SDTypes files.
It is probably a good idea to put the different iterations of the Yago and SDTimes files in separate timestamped directories. [Just as you do for the instance-types file, and most of the other files of DBPedia].

m1ci · January 22, 2021, 9:08am

For SDTypes we have only the 2016/10/01 version. Newer version, however, has not been created.

As for YAGO, as far as I know it is on the todo list to include newer releases. @hopver might have more info on this.

Currently I need to update a live instance of DBPedia from 9 months ago, and I cannot retrieve the corresponding Yago and SDTypes files.

They are available, right? or?

datao · February 20, 2021, 12:56am

Hi again.
I am now stuck trying to find which file of the DBPedia dataset (i.e latest-core) includes that triple:

dbr:Daffy_Duck rdf:type dbo:FictionalCharacter

It is right there on the dbpedia endpoint: https://yasgui.triply.cc/#query=select%20*%20where%20{<http%3A%2F%2Fdbpedia.org%2Fresource%2FDaffy_Duck>%20%3Fp%20%3Fo} &endpoint=https%3A%2F%2Fdbpedia.org%2Fsparql&requestMethod=POST&tabTitle=Query&headers={}&contentTypeConstruct=application%2Fn-triples%2C*%2F*%3Bq%3D0.9&contentTypeSelect=application%2Fsparql-results%2Bjson%2C*%2F*%3Bq%3D0.9&outputFormat=table

But nowhere to be found in the files discused above in that thread.

Any help is very welcome.