I am performing some experiments in which I need to check, for each wikilink in the abstract of a Wikipedia page (English chapter), wheter there is a triple in DBpedia linking the page and the mentioned wikilink.
I firstly implemented this crawling Wikipedia and with some scritps against the DBpedia SPARQL endpoint, but I’m dealing with a large ammount of entities, and after some time running the bot have some issues. I guess because I am doing too many requests in short time periods. So I’m reimplementing my scripts to work against local data dumps of Wikipedia and DBpedia.
At the moment, the only triples I need from DBpedia are the ones linking entities in the http://dbpedia.org/resource/ namespace.
So my question is: ¿which files am I supposed to dowload from the endpoint to get those triples?
My guess is that the following ones may be enough, but I’d like to confirm that I am not missing relevant content nor proccessing too much information:
- Extracted facts from Wikipedia Infoboxes ? (Is this one redundant?)
Also, I’ve explained my background problem in case someone can figure out a better approach/tool to solve it =)