Using property values instead of classes for higher data quality?

Hi, I want to model software products for the HITO project ontology and reuse classes from DBpedia for that but I had problems finding high quality classes for concepts like “operating system”, “software license” and “programming language” as properties often don’t have a range of if they have one the range is too wide or some of the values don’t even belong in the class at all.

For example:

hito:programmingLanguage rdf:type owl:ObjectProperty ;
rdfs:domain hito:SoftwareProduct ;
rdfs:range dbo:ProgrammingLanguage; # Problem: contains i.e. dbr:Freshwater_ecosystem
rdfs:label “programming language”@en .

This is an issue for me because the users of the ontology use tools to create new instances using lists of possible values and should only be offered values that make sense.
However what I found to drastically improve quality, was to just give the user as options all objects o that occur at least twice in triples of the form (s,p,o), where s is an instance of dbo:Software and p is the property in question (i.e. dbo:operatingSystem). However that is not something that makes sense to define in an ontology.

Is this a general trend or did I just stumble upon the rare case that property values have a higher quality than classes? Is there a general approach to handle such cases? Would it be beneficial to DBpedia to create new classes out of those object values?

Details below:

Operating System
dbo:operatingSystem has an rdfs:domain statement (with dbo:Software) but no range specified.
There is a class yago:OperatingSystem106568134 but that is missing some entries like
yago:Windows_XP, which does not seem to belong to any other types that only includes operating systems.

Programming Language
Similarily, dbo:programmingLanguage also doesn’t have a range but at least the class dbo:ProgrammingLanguage exists.
However that one has strange instances like dbr:Babylonian_law, dbr:Freshwater_ecosystem and dbr:Bipartisanship.
yago:WikicatProgrammingLanguages is working fine, however.

Software License
Next, I was looking for software licenses but http://dbpedia.org/ontology/license, which is used by multiple instances of dbo:Software, has domain dbo:Work and has has no defined range and dbo:License does not exist. yago:License106549661 contains some software licenses, like Apache, but is missing others, such as Creative Commons licenses.

All values given are from the default DBpedia endpoint at 2020-04-09.

Hi @KonradHoeffner,

the algo you define is more a data clean-up procedure for outliers, i.e. assuming that the objects are normally correct, except some things that do not make sense, i.e. Freshwater systems. So you implemented a threshold, i.e. min 2 occurrences to clean it up.
There are tons of options here:

  1. your method
  2. find the mapping responsible and fix it. You can use http://global.dbpedia.org/?s=$dbpediaurl$p=property to debug across languages and find the template responsible
  3. the main endpoint is several years old, there is https://databus.dbpedia.org/dbpedia/mappings/ with monthly releases, it might have improved already
  4. we partitioned a fused version into properties incl. all languages and wikidata here: https://databus.dbpedia.org/vehnem/flexifusion/fusion/ see e.g. http://akswnc7.informatik.uni-leipzig.de/release/flexifusion/fusion/2019.12.15/fusion_dbo_operatingSystem.ttl.bz2 for operating systems
  5. in addition to yago, we are going to load http://caligraph.org/ soon. Maybe this is typed better. It is also on the bus, but there is some junk that needs deletion: https://databus.dbpedia.org/nheist/CaLiGraph