Hi @JulioNoe,
thanks for your reply.
Good to hear that. Right now it seems hard to extract usefull information when using confidence parameter < 0.45
consider this example :
“Woher sie kommen, wohin sie gehen: Das Schicksal der Umsiedler”
will produce very weird results (confidence parameter 0.4):
{
"@text": "Woher sie kommen, wohin sie gehen: Das Schicksal der Umsiedler",
"@confidence": "0.4",
"@support": "0",
"@types": "",
"@sparql": "",
"@policy": "whitelist",
"Resources": [
{
"@URI": "http://de.dbpedia.org/resource/Angela_Merkel",
"@support": "4444",
"@types": "Wikidata:Q386724,Wikidata:Q234460,Schema:CreativeWork,DBpedia:Work,DBpedia:WrittenWork",
"@surfaceForm": "sie",
"@offset": "6",
"@similarityScore": "0.7618334327883661",
"@percentageOfSecondRank": "0.11505440921756185"
},
{
"@URI": "http://de.dbpedia.org/resource/Kosovo",
"@support": "7953",
"@types": "Wikidata:Q6256,Schema:Place,Schema:Country,DBpedia:PopulatedPlace,DBpedia:Place,DBpedia:Location,DBpedia:Country",
"@surfaceForm": "kommen",
"@offset": "10",
"@similarityScore": "0.9959636611260666",
"@percentageOfSecondRank": "0.004031950540247078"
},
{
"@URI": "http://de.dbpedia.org/resource/Angela_Merkel",
"@support": "4444",
"@types": "Wikidata:Q386724,Wikidata:Q234460,Schema:CreativeWork,DBpedia:Work,DBpedia:WrittenWork",
"@surfaceForm": "sie",
"@offset": "24",
"@similarityScore": "0.7618334327883661",
"@percentageOfSecondRank": "0.11505440921756185"
},
{
"@URI": "http://de.dbpedia.org/resource/Gehen",
"@support": "221",
"@types": "",
"@surfaceForm": "gehen",
"@offset": "28",
"@similarityScore": "0.9999856227980902",
"@percentageOfSecondRank": "0.0"
},
{
"@URI": "http://de.dbpedia.org/resource/Das_Schicksal",
"@support": "9",
"@types": "Wikidata:Q386724,Wikidata:Q11424,Schema:Movie,Schema:CreativeWork,DBpedia:Work,DBpedia:Film",
"@surfaceForm": "Das Schicksal",
"@offset": "35",
"@similarityScore": "0.9999999996082067",
"@percentageOfSecondRank": "0.0"
},
{
"@URI": "http://de.dbpedia.org/resource/Siddhartha_Gautama",
"@support": "2928",
"@types": "Http://xmlns.com/foaf/0.1/Person,Wikidata:Q5,Wikidata:Q24229398,Wikidata:Q215627,DUL:NaturalPerson,DUL:Agent,Schema:Person,DBpedia:Agent,DBpedia:Person",
"@surfaceForm": "der",
"@offset": "49",
"@similarityScore": "0.5831854446108904",
"@percentageOfSecondRank": "0.4274700030982275"
},
{
"@URI": "http://de.dbpedia.org/resource/Umsiedler",
"@support": "657",
"@types": "",
"@surfaceForm": "Umsiedler",
"@offset": "53",
"@similarityScore": "0.9999639826191368",
"@percentageOfSecondRank": "2.343138361030307E-5"
}
]
}
Are these results related to the stemming algorithm? When increasing confidence parameter to 0.45 these entries will go away, but i will lose the ability to detect possessive form in german language.
any improvement in this area will be highly welcome
thanks!