Latest DBPedia 2019 Release: detailed statistics

Hi:

I am currently working on a research paper that introduces DBPedia knowledge graph/ontology and includes an evaluation of the timbr.ai DBPedia tool.

Is there any document including the latest DBPedia 2019 release, detailed statistics as done in other DBPedia reports such as “DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia” or similar.

Any help will be highly appreciated (including the references to such documents)

Best Regards

Carlos F. Enguix

@cenguix No, it got much more complicated and is still work in progress. The good thing now is that if we implement it once, we can update it on a monthly basis.

Overall we have several main DBpedia datasets now.

  1. the regular extraction is now split into modules. They are like building blocks and are called https://databus.dbpedia.org/dbpedia/[generic|mappings|wikidata|text] and are complemented by external data such as https://databus.dbpedia.org/propan/lhd/linked-hypernyms or https://databus.dbpedia.org/kurzum/cleaned-data/geonames/2018.03.11
  2. Prefusion: We aggregate all components above into a prefused graph with provenance. You can find the latest files, partitioned by property here: https://databus.dbpedia.org/vehnem/flexifusion/prefusion/2019.11.01 and look at it via https://global.dbpedia.org
  3. from this prefusion, we will export several use-case specific datasets via FlexiFusion. The paper at https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf contains statistics. There is a default fusion produced by it, which is loaded into timbr.ai . I need to check whether there is an updated version yet.
  4. we also do normal collections as well as enriched collections, which is discussed here: DBpedia Dataset 2019-08-30 (Pre-Release)

The DBpedia project would benefit immensely, if you could contribute some good statistics. We could consolidate them and run them each month. Besides the ontological classification, we will also focus on properties more to partition the graph. You can already query the aggregated file by property with this databus query
The next releases will have much better provenance in the JSON-LD

Hi Sebastian:

Thanks a lot for the links provided and the link to the latest DBPedia status article: “DBpedia FlexiFusion The Best of Wikipedia, Wikidata, Your Data”. I enjoyed reading the article and definitely I will include a reference to such article.

Best Regards

Carlos F. Enguix

@cenguix there is also a dev version of the new fusion. It is also partitioned by property, but it already made the decision about which source to use, i.e. it is in NTriples and not JSON-LD: https://databus.dbpedia.org/vehnem/flexifusion/fusion/2019.11.15

I think it is decent alread, but we also found some things, which we will change for the next release, i.e. we forgot to include geonames links and also included Wikimedia commons, which is disparate to other sources and therefore doesn’t fuse well yet.