Simplifying the Chapter endpoint deployment

Hi @fab_hop,

thanks for your work on rebooting the German chapter. I would like to consolidate this work for all chapters. Could you answer some things?

  1. I am not really sure, what version of the DBpediaVAD Docker you used. I think there is one from Joern and one from Magnus. Did you pick it up from docu? Or did Magnus make something? Could you post links or the docker here?

  2. I wrote a query, which should get the latest German Data:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
PREFIX dataid-cv: <http://dataid.dbpedia.org/ns/cv#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>

SELECT ?artifact, ?latestVersion, ?label, ?comment, ?file WHERE {
    { 
    # Subselect latestVersion by artifact
    SELECT  ?artifact (max(?version) as ?latestVersion)  WHERE {
            ?dataset dataid:artifact ?artifact .
            ?dataset dct:hasVersion ?version
            FILTER (?artifact in (
		################
                # GENERIC 
                ################
                <https://databus.dbpedia.org/dbpedia/generic/article-templates> ,
                <https://databus.dbpedia.org/dbpedia/generic/categories> ,
                <https://databus.dbpedia.org/dbpedia/generic/citations> ,
                <https://databus.dbpedia.org/dbpedia/generic/commons-sameas-links> ,
                <https://databus.dbpedia.org/dbpedia/generic/disambiguations> ,
                <https://databus.dbpedia.org/dbpedia/generic/external-links> ,
                <https://databus.dbpedia.org/dbpedia/generic/geo-coordinates> ,
                <https://databus.dbpedia.org/dbpedia/generic/homepages> ,
                <https://databus.dbpedia.org/dbpedia/generic/infobox-properties> ,
                <https://databus.dbpedia.org/dbpedia/generic/infobox-property-definitions> ,
                # not sure if needed
          	# <https://databus.dbpedia.org/dbpedia/generic/interlanguage-links> ,
                <https://databus.dbpedia.org/dbpedia/generic/labels> ,
                # not sure if needed
          	# <https://databus.dbpedia.org/dbpedia/generic/page> ,
                <https://databus.dbpedia.org/dbpedia/generic/persondata> ,
                <https://databus.dbpedia.org/dbpedia/generic/redirects> ,
                # not sure if needed
                # <https://databus.dbpedia.org/dbpedia/generic/revisions> ,
                <https://databus.dbpedia.org/dbpedia/generic/topical-concepts> ,
                # very large, useful for statistical raph analysis
            	# <https://databus.dbpedia.org/dbpedia/generic/wikilinks> ,
                <https://databus.dbpedia.org/dbpedia/generic/wikipedia-links> ,
		################
                # MAPPINGS
          	################
                <https://databus.dbpedia.org/dbpedia/mappings/instance-types>,
                <https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects>,
                <https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals>,
                <https://databus.dbpedia.org/dbpedia/mappings/geo-coordinates-mappingbased>,
          	################
           	# TEXT (still old)
          	################
          	<https://databus.dbpedia.org/dbpedia/text/short-abstracts>, 
          	<https://databus.dbpedia.org/dbpedia/text/long-abstracts>,
          	###############
          	# latest ontology, currently @denis account
          	##############
               <https://databus.dbpedia.org/denis/ontology/dbo-snapshots>

             )) .
             }GROUP BY ?artifact 
	} 
  		
 ?dataset dct:hasVersion ?latestVersion .
 ?dataset rdfs:label ?label .
 ?dataset rdfs:comment ?comment .
    {
          ?dataset dataid:artifact ?artifact .
          ?dataset dcat:distribution ?distribution .
          ?distribution dcat:downloadURL ?file .
          ?distribution dataid:contentVariant "de"^^xsd:string .
          # remove debug info	
          MINUS {
               ?distribution dataid:contentVariant ?variants . 
               FILTER (?variants in ("disjointDomain"^^xsd:string, "disjointRange"^^xsd:string))
          }
    	  # Just the transitives of these two
          MINUS {
              ?dataset dataid:artifact ?a . FILTER (?a IN (
                 <https://databus.dbpedia.org/dbpedia/mappings/instance-types>, 
                 <https://databus.dbpedia.org/dbpedia/generic/redirects>)) .
              ?dataset dcat:distribution ?distribution .
              FILTER NOT EXISTS {?distribution dataid:contentVariant "transitive"^^xsd:string} 
          }
    # NTriples version of the ontology
    } UNION {
          ?dataset dataid:artifact <https://databus.dbpedia.org/denis/ontology/dbo-snapshots> .
          ?dataset dcat:distribution ?distribution .
          ?distribution dcat:mediaType <http://dataid.dbpedia.org/ns/mt#ApplicationNTriples> . 
          ?distribution dcat:downloadURL ?file .
    }
  		
       
} ORDER by ?artifact

If you execute it in yasgui (click here) it returns 23 files, but I think you loaded more.
Could you check or do you have a list?

The main process here is this:

  • nobody really knows where all the up-to-date info about how to set up and maintain a chapter is, so we need to collect all information and put it in one place.
  • If we have a docker, we can put it on https://hub.docker.com/u/dbpedia
  • with the query above chapters can just change “de” to “their language” and then use the docker. DA will work on getting all the missing files running monthly.
2 Likes

Hi Sebastian,

There is this document about chapters which needs to be updated: https://docs.google.com/document/d/1Mxb3ztYHqXt_pPKA7AWJetPjA-neo2kxRSWoO8OWqIw/edit

It is essential to have common knowledge and experiences shared at some place (folder, drive etc.) because it is really difficult to collect the necessary info sometimes.

Hi Sebastian,

sorry I was busy the last few days and couldn’t answer your questions promptly.

Regarding your questions:

  1. We used this https://hub.docker.com/r/dbpedia/virtuoso/ docker image. It is based on Joern’s virtuoso image, but has already the DBpedia plugin installed. It might be useful to link the sources (https://github.com/dbpedia/Dockerized-DBpedia ?) for this docker image in the docker hub description. Additionally, the “Local Chapter Setup” description for ensuring that the folders are empty is not ideal, because rm -rf "$db_dir"/* might delete more than expected if somehow the variable remains empty (for example due to forgetting to create it or a wrong assignment).
  2. As briefly mentioned during the presentation (or at least in the slides) we used the Download script provided by the DBpedia docker image (https://github.com/dbpedia/Dockerized-DBpedia/blob/master/download.sh). This script should download the data from http://downloads.dbpedia.org/2016-10/core-i18n/de/. It does not use the databus. Consequently, our knowledge about the databus is quite limited. Nevertheless, it is interesting that it appears to miss a few German resources.

I hope this answers your questions. We are happy to help you to consolidate this work for other language chapters. Please let us know if you need other information.

@fab_hop thanks for the info. I think we can take it from there.
The major changes to switch to databus is basically to replace this line: https://github.com/dbpedia/Dockerized-DBpedia/blob/master/download.sh#L177 with the result of the above SPARQL query. The code can also be shorter then.

Databus also adds some more features, i.e.:

  • it can be updated monthly
  • you can add a UNION to the query and also load other datasets from the bus automatically, e.g. https://databus.dbpedia.org/kurzum/cleaned-data/geonames/2018.03.11 This means that each chapter can better adopt to language or country-specific needs or add their own datasets to the endpoint
  • we are currently preparing monthly enriched releases. Previously, only the main endpoint had data from several languages, now we can produce enriched versions, i.e. the language + other languages + wikidata + other datasets. This generifies en_uris and wd_uris variants. It means that each language or national endpoint can quickly grow and adapt.
  • we might be able to create better plugins for debugging, so the linked data HTML view can contain more useful links.

I will try to execute the download script to check exactly, whether we have all files in the monthly releases. Like we are not doing .tql each month, maybe never, if nobody asks.

@fab_hop: did Magnus do anything for this deployment? I only see https://github.com/dbpedia/Dockerized-DBpedia with a one year old commit that fails in travis. Seems like you used it as is and it worked fine.

Particular for this deployment Magnus did nothing, at least to my knowledge.

The docker image worked fine for us. Either their is a difference between the github and the dockerhub version or simple the failed tests are not required for the basic setup.