Data about Brazilian cities

I don’t know what is loaded and from when the data is. Maybe @diegomoussallem knows who knows. We fixed the chapter dockers yesterday, so next year we can update all chapters.

1 Like

Hi @kurzum and @herrmann,
Sorry for disappear, this week was quite busy to me.

@kurzum I got your point about the local offices, totally agree with this, but I was just mentioning the language community itself as @herrmann pointed. Anyway, let’s make it possible, I am totally for it.

@herrmann regarding the difference of mappings in DBpedia PT, we worked on translating the entire DBpedia ontology to Portuguese along with some properties by using a Neural Machine Translation sometime ago (https://twitter.com/DiegoMoussallem/status/872838862460071943?s=20), consequently we fixed some mappings. However, this work was not finished due to lack of human resources and we couldn’t make it official, so that’s the reason, but the dump is 2016-10.

Best

1 Like

Hi, @diegomoussallem.

Are the fixed mappings you produced available anywhere for review? Sorry if this is explained in the tweet you linked to, but Twitter is blocked in my network (:warning:) at the moment. I have to remember to look it up again when I’m home.

It would probably be less work to review your mappings than create new ones altogether from scratch.

1 Like

NP, I am also offline from the forum for weeks sometimes… (guess, we all have that).

@diegomoussallem the link in your tweet is access controlled: https://t.co/WU2RSkArd1?amp=1

1 Like

Hi @herrmann and @kurzum, I will make it available along with the generated ontology file. Now I remember that I made it private for controlling the evaluation process in one of my papers regarding RDF verbalization.

1 Like

@herrmann I put the DTB csv on the bus: https://databus.dbpedia.org/kurzum/ibge/dtb/2018.01.01

We have tarql mapping capabilities for the databus client. I started mapping the table: https://github.com/dbpedia/format-mappings/blob/master/tarql/2.sparql

So if you run with bin/DatabusClient -f ttl -c gz -s ibge.query it is an effective download csv as ttl in gz.

A question. How did you sameAs link the municipialities with DBpedia. Did you use Nome_Município?

result: http://temporary.dbpedia.org/temporary/dtb_type%3Dmunicipio.ttl.gz

ibge.query

PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
PREFIX dataid-cv: <http://dataid.dbpedia.org/ns/cv#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat:  <http://www.w3.org/ns/dcat#>

# Get all files
SELECT DISTINCT ?file WHERE {
 	?dataset dataid:version <https://databus.dbpedia.org/kurzum/ibge/dtb/2018.01.01> .
	?dataset dcat:distribution ?distribution .
	?distribution dcat:downloadURL ?file .
}
1 Like

Cool to see this example of use of tarql, @kurzum.

However, the mapping that would be useful to me is the one over the PT Wikipedia data, not the IBGE one, as the former has the information I want (website links) and the latter does not. That is what @diegomoussallem has made but not yet shared.

Also if it is possible to obtain data from Wikipedia more recent than this 2016 dump it would be pretty useful.

I did not operate directly on an RDF graph, but instead converted the data first to tables and dataframes.

I merged a Pandas dataframe obtained from a csv file resulting from a SPARQL query on the pt.dbpedia.org endpoint with two dataframes derived from the IBGE csv file: first the dataframe of states (L60) and then the dataframe of municipalities (L69).

The keys used for the merge are the state name (dbo:state/rdfs:label) and municipality name (rfds:label). The IBGE code isn’t very useful as a key for this merge because most of the DBPedia data does not include it. But I leave it in the resulting dataset, as I think it will be useful for cross-referencing with other sources in the future.

@herrmann you are thinking in the wrong direction. I am preparing the IBGE data to the loaded into pt.dbpedia.org/sparql as well as into global.dbpedia.org. We just need the sameAs links and the tarql mapping.

Simple pattern:

  1. get authoritative, national, open data in any machine readable format (csv, xml, etc.)
  2. Map and link it
  3. load it into DBpedia, i.e. the national knowledge graphs and global with the normal data

This only needs to be done once per source and maintained once per source release. But the data will be ready for everyone to be consumed. No more ad-hoc integration projects like yours.

Hi @herrmann

I have shared the spreadsheet with you, I am not sure if it is what you are looking for. I need some time to find the update owl file. I am quite busy until this Friday and I will let you know as soon as I get it.

Best regards,

Diego

What you’re proposing does make sense. However, this is the non-trivial step:

DBPedia uses for its city URIs the names of articles from Wikipedia. While in many cases this is based on a city name, this is not always the case. Cities with a name that is ambiguous with something get a disambiguation part in parenthesis, and so on. So I’m not sure the sameAs link can be established by using tarql alone. You need to use the DBPedia data there as well. Is it available in the tarql context?

As I did in my script, you need to first take the federation unit name, which is the Nome_UF column in that csv. With the municipality name and state name we could then build a SPARQL query to get the city URI in DBPedia, and then establish the sameAs link. Perhaps by replacing line 14 in the tarql with something like:

{
    ?sameAs a ?city_type .
    FILTER (?city_type IN (dbo:City, dbo:Settlement))
    ?sameAs rdfs:label ?name ;
        dbo:state/rdfs:label ?Nome_UF .
}

What do you think, @kurzum ?


Thanks. This seems to be a spreadsheet with the DBPedia Ontology properties and their translation to Brazilian Portuguese. However, I see no column or sheet there with a reference to the PT Wikipedia templates and their properties. I was expecting something like this page from the Wikipedia mappings wiki. Have you done anything like that?

If you haven’t done it, can’t find it or don’t have the time right now, there is no problem. I am not in a hurry. :slight_smile:

Totally agree. All I am saying is that if we do this one once, then DBpedia (PT and Global) will contain: 1. all municipialities, 2. the official codigo and also 3. the correct website URL. This should entail that for the next dataset these are available and therefore linking might become easier. Going towards a sustainable linked open data effort, not one time, ad-hoc integration.

I agree with that. I asked for

  1. your opinion on using the above code fragment in replacement for line 14 for determining the sameAs links; and
  2. whether or not the DBPedia graph is available inside the tarql context in Databus, to make that possible.

I have now published the mapping between the IBGE code of Brazilian cities and DBPedia URIs:

Note that is has not been possible to map all of the municipalities with this method, but most of them are there. Only 284 out of 5570 were left unmapped.

1 Like

Hi, guys! I’ve updated this mapping that can be used to establish sameAs links. Now there are separate columns for the DBPedia URI, Portuguese DBPedia URI and Wikidata URI, where available.

For some strange reason, a lot of municipalities in the state of Espírito Santo (ES) are missing URIs.

The DBPedia SPARQL endpoint is exhibiting some very odd behaviour. The following query executes just fine.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX yago:<http://dbpedia.org/class/yago/>

SELECT *

WHERE {

    # select by classes
    {
        ?city a ?city_type .
        FILTER (?city_type IN (dbo:City, dbo:Settlement))
    }

    # select by properties
    UNION { ?city dbo:wikiPageWikiLink dbr:Mayor }
    UNION { ?city dbp:leaderTitle dbr:Mayor }
    UNION { ?thing dbp:city ?city }

    # restrict query to make sure those are cities in Brazil
    FILTER (
        EXISTS { ?city a dbr:Municipalities_of_Brazil } ||
        EXISTS { ?city dbo:wikiPageWikiLink dbr:States_of_Brazil } ||
        EXISTS { ?city dbo:country dbr:Brazil } ||
        EXISTS { ?city dbp:settlementType dbr:Municipalities_of_Brazil } ||
        EXISTS { dbr:List_of_municipalities_of_Brazil dbo:wikiPageWikiLink ?city }

    )

    OPTIONAL {
        ?city foaf:homepage ?link .
    }
#    OPTIONAL {
#        ?city rdfs:label ?name .
#    }
}

However, if I uncomment the last OPTIONAL clause, the query takes very long to execute, and finally returns an empty set.

That problem does not happen on the Portuguese DBPedia SPARQL endpoint, just on the main DBPedia one. Very strange. For now, I’m going to use just the Portuguese DBPedia.

The query now works, even if I uncomment the last OPTIONAL part! Great! :slightly_smiling_face: