Proposal: Vocabulary-Agnostic Abstraction & Semantic Categorization Layer for NEF

Hi @tsoru and the DBpedia community,

My name is Krishna, a 3rd-year CSE undergrad at Amrita Vishwa Vidyapeetham. I have been closely following the excellent recent PRs regarding Dockerization and hallucination validation. To complement that infrastructure work, I would like to propose a project that specifically targets the currently unaddressed goals from the project description: Vocabulary Flexibility, Relation Semantics and Alternative Extraction Targets.

Here is my proposed structure following the required format:

Description

The current Neural Extraction Framework is tightly coupled to the DBpedia vocabulary and extracts relations purely based on embedding similarity. While recent community efforts address scalability, the pipeline still lacks the ability to generalize across other standard vocabularies (SKOS, schema.org, Wikidata). Additionally, the extracted relations are not logically categorized (e.g., symmetric, transitive), and complex implicit relationships like Causality and Issuance remain largely untapped. This project introduces a Vocabulary-Agnostic Abstraction Layer and a Semantic Classifier to the NEF.

Goal

  1. Vocabulary Flexibility: Implement a dynamic schema-routing layer using LLM structured outputs (via Pydantic/function calling) that adapts the extracted RDF predicates to any target ontology (e.g., schema.org, RDFS, Wikidata) provided at runtime, removing hardcoded DBpedia dependencies.
  2. Semantic Categorization: Utilize lightweight HuggingFace NLI models to classify extracted relations by their logical semantics (reflexive, symmetric, transitive, equivalence) to enrich downstream reasoning and graph consistency.
  3. Alternative Extraction Targets: Extend the LLM extraction prompts and logic to specifically identify and structure complex relationships, specifically Causality and Issuance, from dense Wikipedia text.

Material

  • The neural-extraction-framework repository (building seamlessly on top of the recent Docker/Redis infrastructure updates).
  • LangChain and Pydantic for enforcing dynamic LLM response schemas across different vocabularies.
  • HuggingFace Transformers (NLI models) for zero-shot classification of relation semantics.
  • External ontology definitions (schema.org, Wikidata) and DBpedia Lookup.

Project size(s)

Large (350 hours): This involves architecting the abstraction layer, integrating the semantic classification models, handling complex multi-hop relations (Causality/Issuance), and writing comprehensive unit tests for multi-vocabulary outputs.

Impact

This project will transform the Neural Extraction Framework from a DBpedia-specific utility into a universal, highly adaptable knowledge extraction engine. Furthermore, adding semantic categorization will significantly elevate the logical consistency and inferencing power of the millions of new statements generated by the pipeline.

Warm-up Tasks & Next Steps

I have familiarized myself with the DBpedia endpoint, cloned the repository, and set up the local environment. I am currently prototyping the dynamic prompt injection for schema.org mapping locally to prove the concept of vocabulary flexibility. I plan to share my initial findings and a draft PR shortly.

I would appreciate any feedback on whether prioritizing this abstraction and semantic categorization aligns well with the immediate 2026 roadmap!

Best regards,

Siva Rama Krishna Reddy Padala

Hi Siva, thanks for your idea submission.

As this is strictly related with the Neural Extraction Framework project, please refer to it from the related page below. Thanks!