Data about Brazilian cities

I thought you meant the “test this mapping” link on the second line of that page. That ones gives an exception.

Error

Exception: org.xml.sax.SAXParseException; lineNumber: 22; columnNumber: 3; The element type “img” must be terminated by the matching end-tag “”.
Stacktrace: org.xml.sax.SAXParseException; lineNumber: 22; columnNumber: 3; The element type “img” must be terminated by the matching end-tag “”. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1709) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2900) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123) at
(…)

Now I’ve noticed the validate button. The error message is not super helpful, as it indicates no line number, leaving to guesswork the job of finding exactly with property has a problem.

Anyway, I think I figured out a way to get the data the way I want it just by using a SPARQL query, no I have no need to edit the mapping. Once I get it done I’ll post it here so other people don’t have to go through the same hoops.

Of course I could edit the mappings anyway, in order to improve the data on DBPedia, once I understand the discrepancies I reported a few messages back.

Here is my final (works finely so far) version of the query.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX ptdbp:<http://pt.dbpedia.org/property/>

SELECT ?city, ?name, ?state, ?link, ?link_prefeitura, ?link_camara, ?external_link WHERE {
    ?city a dbo:City .
    FILTER (
        EXISTS { ?city dbo:wikiPageWikiLink dbr:States_of_Brazil } ||
        EXISTS { ?city ptdbp:wikiPageUsesTemplate <http://pt.dbpedia.org/resource/Predefinição:Info/Município_do_Brasil> }
    )
    OPTIONAL { ?city foaf:homepage ?link }
    OPTIONAL {
        ?city dbo:wikiPageExternalLink ?external_link .
        FILTER REGEX(STR(?external_link), ".gov.br")
    }
    OPTIONAL {?city rdfs:label ?name}
    OPTIONAL {?city dbp:estado ?state}
    OPTIONAL {?city dbo:state/rdfs:label ?state}
    OPTIONAL {?city ptdbp:siteCâmara ?link_camara}
    OPTIONAL {?city ptdbp:sitePrefeitura ?link_prefeitura}
}

See:

I did not need to edit the mappings, after all. The mappings are wrong, but the data is right. I still don’t understand why. Perhaps another (correct) mapping is being used by the Portuguese DBPedia instead? @diegomoussallem? I could edit the mapping and try to improve it, once I understand this.

@herrmann yeah, let me explain it to you.

Your query does the following:

  1. ?city selects the part that you are interested in with the two EXIST clauses. ?city dbo:wikiPageWikiLink dbr:States_of_Brazil is quite imprecise. ?city a dbo:Settlement might be better.
  2. the second part is where you map and filter the raw data to what you want, i.e.
    REGEX(STR(?external_link), ".gov.br" ,

If you look at this query you would find more interesting dbp: properties:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX ptdbp:<http://pt.dbpedia.org/property/>

SELECT distinct ?p  WHERE {
 
    ?city dbo:wikiPageWikiLink dbr:States_of_Brazil
    FILTER NOT EXISTS { ?city ptdbp:wikiPageUsesTemplate <http://pt.dbpedia.org/resource/Predefinição:Info/Município_do_Brasil> }

    ?city ?p ?o .
}
ORDER by ?p

such as http://dbpedia.org/property/site http://dbpedia.org/property/siteOficial

With the mappings you can map those a priori to the right ontology property and then the queries get easier. Otherwise, everybody needs to write more complicate queries, i.e. you could add:

OPTIONAL {
  {?city ptdbp:siteCâmara ?link_camara} UNION {?city dbp:site ?link_camara} }

(not sure if correct)

But the idea here is that the mappings help you to query the data easier and sometimes help with parsing. But it is still totally possible to write queries with many optionals and unions and other rules over dbp: and ptdbp.

Thank you, @kurzum, your suggestions make total sense.

You’re right. By including dbo:Settlement type resources I got a lot more results to my query.

But (and this is important), the purpose of using ?city dbo:wikiPageWikiLink dbr:States_of_Brazil is to restrict the query to include only Brazilian cities, and not cities anywhere else around the world. I could find no better way to filter results. For instance, the dbp:país (country) property is totally unreliable – I couldn’t find a single resource that used this property pointing to Brazil. I welcome other suggestions on how to restrict the query to include Brazil only.

Yes, thank you, dbp:site and dbp:siteOficial did in fact give me more links to work with. On the other hand, other properties that sound useful like dbp:website and dbp:websiteGoverno were so sparsely used as to be almost useless.

However, looking at those properties gave me an idea to broaden the search even more, by including this part:

    UNION { ?city dbp:prefeito ?mayor }
    UNION { ?thing dbp:cidade ?city }

With that, here is my current version of the query:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX ptdbp:<http://pt.dbpedia.org/property/>

SELECT ?city, ?name, ?state, ?link, ?link_prefeitura, ?link_camara, ?external_link, ?link_site, ?link_site_oficial WHERE {
    
    # select by classes
    {
        ?city a ?city_type .
        FILTER (?city_type IN (dbo:City, dbo:Settlement))
    }
    
    # select by properties
    UNION { ?city dbp:prefeito ?mayor }
    UNION { ?thing dbp:cidade ?city }
    
    # restrict query to make sure those are cities in Brazil
    FILTER (
        EXISTS { ?city dbo:wikiPageWikiLink dbr:States_of_Brazil } ||
        EXISTS { ?city ptdbp:wikiPageUsesTemplate <http://pt.dbpedia.org/resource/Predefinição:Info/Município_do_Brasil> }
    )
    
    # get the properties likely to contain links
    OPTIONAL { ?city foaf:homepage ?link }
    OPTIONAL {
        ?city dbo:wikiPageExternalLink ?external_link .
        FILTER REGEX(STR(?external_link), ".gov.br")
    }
    OPTIONAL {?city rdfs:label ?name}
    OPTIONAL {?city dbp:estado ?state}
    OPTIONAL {?city dbo:state/rdfs:label ?state}
    OPTIONAL {?city ptdbp:siteCâmara ?link_camara}
    OPTIONAL {?city ptdbp:sitePrefeitura ?link_prefeitura}
    OPTIONAL {?city ptdbp:site ?link_site}
    OPTIONAL {?city ptdbp:siteOficial ?link_site_oficial}
}

Yes, I suppose I could edit the mappings to simplify my query quite a bit. What I still don’t understand about the mappings (and I keep repeating this) is how the mappings doesn’t match with the data. It’s as if the data was produced by a different version of the mapping. Take a look at this:

How is it possible to have an incorrect mapping produce correct data? :thinking:

I don’t know what is loaded and from when the data is. Maybe @diegomoussallem knows who knows. We fixed the chapter dockers yesterday, so next year we can update all chapters.

1 Like

Hi @kurzum and @herrmann,
Sorry for disappear, this week was quite busy to me.

@kurzum I got your point about the local offices, totally agree with this, but I was just mentioning the language community itself as @herrmann pointed. Anyway, let’s make it possible, I am totally for it.

@herrmann regarding the difference of mappings in DBpedia PT, we worked on translating the entire DBpedia ontology to Portuguese along with some properties by using a Neural Machine Translation sometime ago (https://twitter.com/DiegoMoussallem/status/872838862460071943?s=20), consequently we fixed some mappings. However, this work was not finished due to lack of human resources and we couldn’t make it official, so that’s the reason, but the dump is 2016-10.

Best

1 Like

Hi, @diegomoussallem.

Are the fixed mappings you produced available anywhere for review? Sorry if this is explained in the tweet you linked to, but Twitter is blocked in my network (:warning:) at the moment. I have to remember to look it up again when I’m home.

It would probably be less work to review your mappings than create new ones altogether from scratch.

1 Like

NP, I am also offline from the forum for weeks sometimes… (guess, we all have that).

@diegomoussallem the link in your tweet is access controlled: https://t.co/WU2RSkArd1?amp=1

1 Like

Hi @herrmann and @kurzum, I will make it available along with the generated ontology file. Now I remember that I made it private for controlling the evaluation process in one of my papers regarding RDF verbalization.

1 Like

@herrmann I put the DTB csv on the bus: https://databus.dbpedia.org/kurzum/ibge/dtb/2018.01.01

We have tarql mapping capabilities for the databus client. I started mapping the table: https://github.com/dbpedia/format-mappings/blob/master/tarql/2.sparql

So if you run with bin/DatabusClient -f ttl -c gz -s ibge.query it is an effective download csv as ttl in gz.

A question. How did you sameAs link the municipialities with DBpedia. Did you use Nome_Município?

result: http://temporary.dbpedia.org/temporary/dtb_type%3Dmunicipio.ttl.gz

ibge.query

PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
PREFIX dataid-cv: <http://dataid.dbpedia.org/ns/cv#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat:  <http://www.w3.org/ns/dcat#>

# Get all files
SELECT DISTINCT ?file WHERE {
 	?dataset dataid:version <https://databus.dbpedia.org/kurzum/ibge/dtb/2018.01.01> .
	?dataset dcat:distribution ?distribution .
	?distribution dcat:downloadURL ?file .
}
1 Like

Cool to see this example of use of tarql, @kurzum.

However, the mapping that would be useful to me is the one over the PT Wikipedia data, not the IBGE one, as the former has the information I want (website links) and the latter does not. That is what @diegomoussallem has made but not yet shared.

Also if it is possible to obtain data from Wikipedia more recent than this 2016 dump it would be pretty useful.

I did not operate directly on an RDF graph, but instead converted the data first to tables and dataframes.

I merged a Pandas dataframe obtained from a csv file resulting from a SPARQL query on the pt.dbpedia.org endpoint with two dataframes derived from the IBGE csv file: first the dataframe of states (L60) and then the dataframe of municipalities (L69).

The keys used for the merge are the state name (dbo:state/rdfs:label) and municipality name (rfds:label). The IBGE code isn’t very useful as a key for this merge because most of the DBPedia data does not include it. But I leave it in the resulting dataset, as I think it will be useful for cross-referencing with other sources in the future.

@herrmann you are thinking in the wrong direction. I am preparing the IBGE data to the loaded into pt.dbpedia.org/sparql as well as into global.dbpedia.org. We just need the sameAs links and the tarql mapping.

Simple pattern:

  1. get authoritative, national, open data in any machine readable format (csv, xml, etc.)
  2. Map and link it
  3. load it into DBpedia, i.e. the national knowledge graphs and global with the normal data

This only needs to be done once per source and maintained once per source release. But the data will be ready for everyone to be consumed. No more ad-hoc integration projects like yours.

Hi @herrmann

I have shared the spreadsheet with you, I am not sure if it is what you are looking for. I need some time to find the update owl file. I am quite busy until this Friday and I will let you know as soon as I get it.

Best regards,

Diego

What you’re proposing does make sense. However, this is the non-trivial step:

DBPedia uses for its city URIs the names of articles from Wikipedia. While in many cases this is based on a city name, this is not always the case. Cities with a name that is ambiguous with something get a disambiguation part in parenthesis, and so on. So I’m not sure the sameAs link can be established by using tarql alone. You need to use the DBPedia data there as well. Is it available in the tarql context?

As I did in my script, you need to first take the federation unit name, which is the Nome_UF column in that csv. With the municipality name and state name we could then build a SPARQL query to get the city URI in DBPedia, and then establish the sameAs link. Perhaps by replacing line 14 in the tarql with something like:

{
    ?sameAs a ?city_type .
    FILTER (?city_type IN (dbo:City, dbo:Settlement))
    ?sameAs rdfs:label ?name ;
        dbo:state/rdfs:label ?Nome_UF .
}

What do you think, @kurzum ?


Thanks. This seems to be a spreadsheet with the DBPedia Ontology properties and their translation to Brazilian Portuguese. However, I see no column or sheet there with a reference to the PT Wikipedia templates and their properties. I was expecting something like this page from the Wikipedia mappings wiki. Have you done anything like that?

If you haven’t done it, can’t find it or don’t have the time right now, there is no problem. I am not in a hurry. :slight_smile:

Totally agree. All I am saying is that if we do this one once, then DBpedia (PT and Global) will contain: 1. all municipialities, 2. the official codigo and also 3. the correct website URL. This should entail that for the next dataset these are available and therefore linking might become easier. Going towards a sustainable linked open data effort, not one time, ad-hoc integration.

I agree with that. I asked for

  1. your opinion on using the above code fragment in replacement for line 14 for determining the sameAs links; and
  2. whether or not the DBPedia graph is available inside the tarql context in Databus, to make that possible.

I have now published the mapping between the IBGE code of Brazilian cities and DBPedia URIs:

Note that is has not been possible to map all of the municipalities with this method, but most of them are there. Only 284 out of 5570 were left unmapped.

1 Like

Hi, guys! I’ve updated this mapping that can be used to establish sameAs links. Now there are separate columns for the DBPedia URI, Portuguese DBPedia URI and Wikidata URI, where available.

For some strange reason, a lot of municipalities in the state of Espírito Santo (ES) are missing URIs.

The DBPedia SPARQL endpoint is exhibiting some very odd behaviour. The following query executes just fine.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX yago:<http://dbpedia.org/class/yago/>

SELECT *

WHERE {

    # select by classes
    {
        ?city a ?city_type .
        FILTER (?city_type IN (dbo:City, dbo:Settlement))
    }

    # select by properties
    UNION { ?city dbo:wikiPageWikiLink dbr:Mayor }
    UNION { ?city dbp:leaderTitle dbr:Mayor }
    UNION { ?thing dbp:city ?city }

    # restrict query to make sure those are cities in Brazil
    FILTER (
        EXISTS { ?city a dbr:Municipalities_of_Brazil } ||
        EXISTS { ?city dbo:wikiPageWikiLink dbr:States_of_Brazil } ||
        EXISTS { ?city dbo:country dbr:Brazil } ||
        EXISTS { ?city dbp:settlementType dbr:Municipalities_of_Brazil } ||
        EXISTS { dbr:List_of_municipalities_of_Brazil dbo:wikiPageWikiLink ?city }

    )

    OPTIONAL {
        ?city foaf:homepage ?link .
    }
#    OPTIONAL {
#        ?city rdfs:label ?name .
#    }
}

However, if I uncomment the last OPTIONAL clause, the query takes very long to execute, and finally returns an empty set.

That problem does not happen on the Portuguese DBPedia SPARQL endpoint, just on the main DBPedia one. Very strange. For now, I’m going to use just the Portuguese DBPedia.

The query now works, even if I uncomment the last OPTIONAL part! Great! :slightly_smiling_face: