Spotlight: low similarity scores / confidence for possessive form in german

JulioNoe · November 20, 2020, 1:54pm

Thank you for your questions.

Why is similarityScore for ‘Deutschland’ and ‘Sizilien’ quite low (compared to english form)?

The result depends on the language model. The English language model contains more elements (tokens, URLs, surface forms, pairs ,etc.) than the German language model. Then, in the candidate selection process or disambiguation process, the elements at hand for each language are different.

Why are there very different confidence parameter thresholds? German possessive form will not be detected if confidence parameter is > 0.4 (for english > 0.8)

For this question, the stemmer algorithm is the most probable answer. We are working on improving this part of the DBpedia-Spotlight. If you are interested in this topic please visit this link for more details.

Thanks for your questions, both help us to improve the DBpedia-Spotlight, and please if you have any other questions don’t hesitate to publish it in the forum. Thanks