DBpedia spotlight Annotation

Hi There

Observed that DBpedia spotlight returns repetitive terms and also returns terms which are of plural and singular
Eg: Passed the document (only till abstract)(https://www.nature.com/articles/s41541-020-00219-x#:~:text=The%20LASV%20envelope%20surface%20glycoprotein,design%20of%20an%20LASV%20vaccine). for which results are as below from DBpedia spotlight
Adjuvant, Lassa virus, glycoprotein, immunogenic, antibodies, Lassa mammarenavirus (LASV), arenavirus endemic, Lassa fever, viral hemorrhagic fever, neutralizing antibodies, prophylactic, glycoprotein , neutralizing antibodies,antigen, vaccine, Josiah, conformation, immunization, adjuvanted, immunization, antibody, epitopes, immunization, antibodies, polyclonal, neutralizing antibodies

In the above results ,

  1. Adjuvant and adjuvanted, antibody and antibodies are same terms but with plural form - Do you have any customizations to avoid these types?
  2. Getting repetitive terms like neutralizing antibodies, immunization are being generated, -Do you have any customizations to avoid these types?

Hi @shylaja,

Thanks for the questions. As DBpedia-Spotlight is an annotation tool, it looks for terms/named entities/words that match with DBpedia entities through the process of spotting, candidate selection, disambiguation, and filtering. An important step is the lemmatisation process to reduce words to their base form (for example, the English word “walk” is the base form of “walked”, “walks”, “walking”). For your both questions:

  1. Adjuvant and adjuvanted, antibody and antibodies are same terms but with plural form - Do you have any customizations to avoid these types?
  2. Getting repetitive terms like neutralizing antibodies, immunization are being generated, -Do you have any customizations to avoid these types?

The DBpedia-Spotlight defines a confidence value in a range of 0-1, where a high confidence threshold instructs DBpedia Spotlight to avoid incorrect annotations as much as possible. Using the demo.dbpedia-spotlight.org you can select different confidence values trying to experiment and select the best for your purpose. I hope this information help you. Have a great day.