Hello DBpedia Folks!
I am continuing my road in the French DBpedia Chapter universe, and the new stage to cross is today about hosting the DBpedia Live process.
This one needs three services for operating:
- a virtuoso database, that could be based on https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart but customized by creating graphs and subgraphs via these commands
- the extraction framework
- the live-mirror
The extraction framework allows producing the changesets needed by the live-mirror for populating the virtuoso.
Before talking about it, i wanted to ask you a question about the changesets, the dumps were available here: Index of /live/changesets/.
- Is there a particular reason why you no longer provide these datadumps ?
Now going back on the live part of the application framework, i have to notify that a lot of packages need to be updated in the mvn dependencies, i list it here (i could open issues if you judge it useful) :
- socket.io-java-client is not more available on , but as the lib is available on https://github.com/fatshotty/socket.io-java-client, it is still possible to Jitpack it
- mysql-connector-java if you are working like me with a version 8 of mysql
- fasterxml.Jackson.core from 2.5 to 2.9.8
- jackson-module-scala_2.11 Version : 2.5.2 to 2.9.8
So after these updates, it is possible to follow the instructions of the DBpedia dev bible.
http://dev.dbpedia.org/DBpedia_Live_Continuous_Extraction
But… i think that the documentation can be more explicit on how to configure it.
In fact when i read your paper of 2017 i understood that the “update stream” mechanism was caught first by the IOA technology, and the choice was made to migrate it to the RCstream technology.
The live.ini file related that history by offering the possibility to choose one or another. Indeed both could be configurated :
;*********************
; OAI Configuration
;*********************
localApiURL = http://live.dbpedia.org/syncw/api.php
oaiUri = http://live.dbpedia.org/syncwiki/Special:OAIRepository
oaiPrefix = oai:live.dbpedia.org:dbpediawiki:
baseWikiUri = http://live.dbpedia.org/syncwiki/
mappingsOAIUri = http://mappings.dbpedia.org/index.php/Special:OAIRepository
mappingsOaiPrefix = oai:fr.wikipedia.org:frwiki:
mappingsBaseWikiUri = http://mappings.dbpedia.org/wiki/
and
;*********************
; FEEDERS
;*********************
feeder.rcstream.enabled = false
feeder.rcstream.room = fr.wikipedia.org
; Specify the namespace code of events you want to be processed
; Full list available at https://en.wikipedia.org/wiki/Wikipedia:Namespace
; Add at least namespace 6 "File:" to process files on commons.wikimedia.org
feeder.rcstream.allowedNamespaces = 0,10,14
; Specify how often the RCStream should try to reconnect (maxRetryCount)
; within a intervall of x minutes (maxRetryCountIntervall)
feeder.rcstream.maxRetryCount = 3
feeder.rcstream.maxRetryCountIntervall = 1
feeder.allpages.enabled = false
feeder.allpages.allowedNamespaces = 0,10,14
feeder.live.enabled = true
feeder.live.pollInterval = 3000
feeder.live.sleepInterval = 1000
feeder.mappings.enabled = true
#feeder.mappings.enabled = false
feeder.mappings.pollInterval = 3000
#feeder.mappings.pollInterval = 2000
feeder.mappings.sleepInterval = 1000
feeder.unmodified.enabled = true
feeder.unmodified.pollInterval = 2000
feeder.unmodified.sleepInterval = 1000
feeder.unmodified.minDaysAgo = 30
feeder.unmodified.chunk = 5000
feeder.unmodified.threshold = 500
feeder.unmodified.sleepTime = 30000
feeder.eventstreams.enabled = true
feeder.eventstreams.allowedNamespaces = 0,10,14
feeder.eventstreams.maxLineSize = 32768
feeder.eventstreams.maxEventSize = 65536
; see https://stream.wikimedia.org/?doc for documentation of the EventStreams API
feeder.eventstreams.baseURL = https://stream.wikimedia.org/v2/stream/
feeder.eventstreams.streams = recentchange
;sleeptime in milliseconds
feeder.eventstreams.sleepTime = 3000
feeder.eventstreams.minBackoffFactor = 2
feeder.eventstreams.maxBackoffFactor = 30
- So the localApiURL is not existing today then i enabled the RCtream by switching feeder.rcstream.enabled variable to “true”
I obtained this error :
org.dbpedia.extraction.live.main.Main: An error in the RCStream connection occurred: Error while handshaking Trying to reconnect
- Do you understand why i can request the Rcstream api on the GUI and not via the extraction framework way ?
By working on the question i also read that on the page of API:Recent changes stream - MediaWiki :
" In 2017, wikitech:EventStreams was launched to expose arbitrary stream data over HTTP. This service replaces RCStream (described below)."
So ok, the API is out to date, i tried for fixing it to use the following API :
localApiURL = Aide de l’API MediaWiki — Wikipédia
- So is it a good way to fix it ?
but after this modification I got the following errors :
2022-01-28 12:21:31,790 [main] WARN EventStreamsFeeder: Resuming from date: 1970-01-20T00:28:22Z
[Fatal Error] :6:3: The element type "hr" must be terminated by the matching end-tag "</hr>".
2022-01-28 12:21:31,965 [Feeder_FeederLive] WARN org.dbpedia.extraction.live.util.iterators.OAIRecordIterator: org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 3; The element type "hr" must be terminated by the matching end-tag "</hr>".
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at ORG.oclc.oai.harvester2.verb.HarvesterVerb.harvest(HarvesterVerb.java:260)
at ORG.oclc.oai.harvester2.verb.HarvesterVerb.<init>(HarvesterVerb.java:183)
at ORG.oclc.oai.harvester2.verb.ListRecords.<init>(ListRecords.java:52)
at org.dbpedia.extraction.live.util.iterators.OAIRecordIterator.prefetch(OAIRecordIterator.java:97)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.preparePrefetch(PrefetchIterator.java:40)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.getCurrent(PrefetchIterator.java:50)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.hasNext(PrefetchIterator.java:57)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.getCurrent(PrefetchIterator.java:49)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.next(PrefetchIterator.java:62)
at org.dbpedia.extraction.live.util.iterators.XPathQueryIterator.prefetch(XPathQueryIterator.java:37)
at org.dbpedia.extraction.live.util.iterators.XPathQueryIterator.prefetch(XPathQueryIterator.java:20)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.preparePrefetch(PrefetchIterator.java:40)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.getCurrent(PrefetchIterator.java:50)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.hasNext(PrefetchIterator.java:57)
at org.apache.commons.collections15.iterators.TransformIterator.hasNext(TransformIterator.java:79)
at org.dbpedia.extraction.live.util.iterators.DuplicateFeederItemRemoverIterator.prefetch(DuplicateFeederItemRemoverIterator.java:41)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.preparePrefetch(PrefetchIterator.java:40)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.getCurrent(PrefetchIterator.java:50)
at org.dbpedia.extraction.live.util.iterators.PrefetchIterator.next(PrefetchIterator.java:62)
at org.dbpedia.extraction.live.feeder.OAIFeeder.getNextItems(OAIFeeder.java:56)
at org.dbpedia.extraction.live.feeder.Feeder.run(Feeder.java:111)
- Did you already have to deal with this XML parsing error?
As usual, I still have many questions to ask you but I stop here for the moment and wait for your first feedback.
My best regards,
CĂ©lian