I was wondering how the confidence score and similarity score (and percentage of second rank) are exactly calculated and how they’re related, and why the exact confidence score per linked entity is not returned with the response.
I first assumed that confidence score and similarity score were interchangeable, but when I change the confidence score in my request to the English DBpedia service, the similarity scores don’t seem to have anything to do with it. For example:
!curl https://api.dbpedia-spotlight.org/en/annotate \
--data-urlencode \
"text=Let's see if Barack Obama gets returned." \
--data "confidence=0.35" \
-H "Accept: application/json"
Returns:
{
"@text":" Let\u0027s see if Barack Obama gets returned.",
"@confidence":"0.35",
...
"Resources": [
{"@URI":"http://dbpedia.org/resource/Barack_Obama",...,"@similarityScore":"0.9999768670173619","@percentageOfSecondRank":"1.4176930027341333E-5"}
]
}
While when I change the confidence score to 0.30, I get more linked entities (as expected), but their similarity score is in the 0.7* to 0.9* range. For example:
!curl https://api.dbpedia-spotlight.org/en/annotate \
--data-urlencode \
"text=Let's see if Barack Obama gets returned." \
--data "confidence=0.30" \
-H "Accept: application/json"
Yields:
{
"@text":" Let\u0027s see if Barack Obama gets returned.",
"@confidence":"0.3",
...
"Resources":[
{"@URI":"http://dbpedia.org/resource/Let_Kunovice",...,"@similarityScore":"0.988353121842901","@percentageOfSecondRank":"0.0"},
{"@URI":"http://dbpedia.org/resource/Barack_Obama",...,"@similarityScore":"0.9999768670173619","@percentageOfSecondRank":"1.4176930027341333E-5"},
{"@URI":"http://dbpedia.org/resource/Election",...,"@similarityScore":"0.7815718876846338","@percentageOfSecondRank":"0.15053384173951967"}
]
}
Is the confidence score calculated through the similarity score AND the percentage of second rank (e.g. some sort of distance)? I was wondering how to apply some post-process filtering on the linked entities.