DBPedia REST API - use live endpoint or self-hosted

jvwcd · June 4, 2020, 8:32am

Hello,

I’m quite new to this forum, so my apologies if this is posted in the wrong section.
I’m currently working on a project in which I need to gather textual descriptions of dbpedia resources, i.e. http://dbpedia.org/page/Ship_transport, from there I’ll request the JSON format in order to process the abstract or the rdf#comment from it.

The amount of calls I need to make might end up being “bursty” in nature, in that resources I need to fetch data for come in collections of 10’s to a couple of 100’s. I’m not sure what the fair use policy is of DBPedia and that’s why I’m wondering whether or not it would make sense to host the same REST API that is provided by dbpedia.org… The dataset is public so I can image I’m not the first person to consider this, but I can’t find that much documentation on it.

Any help with regards to making fetching DBPedia resources more robust are very welcome!

Kind regards,

Jan

jfrey · June 5, 2020, 10:24am

Hey @jvwcd, welcome.
The policy for the sparql endpoint is quite clear

maximum request rate = 100 (requests per second per IP address, with an initial burst of 120 requests)

In the last months we have worked on making it easier to setup your own SPARQL endpoint with DBpedia data; see e.g. Using DBpedia offline / Setup own SPARQL endpoint and https://github.com/dbpedia/Dockerized-DBpedia as starting points. This has the advantage that you have a quite fine-grained control on which data you would like to have and of course full performance based on your setup.

There is also a REST API which might be relevant for your use case https://github.com/dbpedia/ontology-driven-api which you can setup locally.

P.S.: BTW. We would be happy if the rest-api would be integrated in the compose setup of dockerized DBpedia as well, with a separate container/service. In case you (or others reading this) have some experience with docker (compose), this could be an easy way to contribute to the community

Cheers,
Johannes

jvwcd · June 5, 2020, 11:15am

Hi Johannes,

Thank you very much for your answer, that really helps me a lot! We will probably look into setting that one up in the coming months I think, as 100 / minute might be hit quite fast in our use case.
We have very limited experience with docker, we have a project running in a containerised environment, if we find any improvements, we’ll make sure to share those via a PR on Github!

Jan