Difference of results between demo.dbpedia-spotlight.org and local docker for proper nouns in french


i installed dbpedia-spotlight with docker, all runs well but i got some differences between the results i got in my local and in demo.dbpedia-spotlight.org and i would like to identify where it comes from

  • my docker image is up to date and well configured for my lang
  • my docker image runs well for everthing else
  • i well check all params used in the api request of online demo
  • my sentence is the same in local or online demo, and isolated, not a long text, so this is not a contextual mistake
  • it is not a problem of language or french translation, the problem is the differences of result for docker and demo webpage for the same lang
  • language is well installed in my side following docker tutorial

here the sentence in french “Le colonel Kurtz est le personnage fictif joué par Marlon Brando dans le film Apocalypse Now réalisé par Francis Ford Coppola” which means in english “The colonel Kurtz is a fictional character portrayed by Marlon Brando from the movie Apocalypse Now directed by Francis Ford Coppola”

The problem i encounter: in my local docker the /spot and /candidates and /annotate route stuck with the term “Ford” and it avoids both “Francis” and “Coppola” whereas in the online demo the return well find the complete name “Francis Ford Coppola”. It works well if i avoid “Ford” and directly write “Francis Coppola”

i checked both the github repository of the docker and the dockerhub, the release seems to be the same according to last update so i’m a little out of idea how to resolve that

Thanks for all

i think i found a lead even if it wont explain clearly the difference of response between demo fr-fr and docker fr-fr, i write it there maybe it will help someone having the same problem than me

In french, the convention for naming somebody is allways the first firstname, although there is two firstnames like “Francis Ford”. People can have on their ID card two or three firstnames, but it is mostly well-off families to please the grand father, and no one know about it or use it. Thats why i think the french spotter works well with “Francis Coppola” without Ford. And historically i think ontologies have been written first for civil registry with strict approach. Moreover if this wasnt complicated enough, we use composed firstname dash-separated like “Jean-Paul” but it is distinct of the civil registry second firstname, it counts as a single firstname. (fyi I tried to spotlight “Francis-Ford” Coppola without success)

Now i made an assumption, the online demo of spotlight is translating first the sentence in english, and in this case it would detect english proper noun convention without error because proper nouns arent translated.

Or the sentence is spotted in multilang and english api response is used as fallback for proper nouns.

I slept on the issue and coming to the right conclusion: a multilang pass is the key, and to do a proper reduction of distincts results without a new pollution i should target with the parameter types dbpedia:Person

I bet doing english will not be enough, because there is maybe more than the french and the english naming convention in other languages i havent discovered yet, which can be used by cultural mimetism

Moreover name can be translated in language too like “Plato” in english and “Platon” in french, there is maybe more use cases i dont know yet too.

I bet it is the same for dbpedia:Place i imagine a lot of examples with country and city names

i’ll try it today and put the post as resolved if it works (unless someone directly confirm or disprove that approach)