Towards Amharic DBpedia

Towards Amharic DBpedia

Description

DBpedia is a collaborative initiative focused on extracting structured information from Wikipedia and presenting it as Linked Open Data. This is a continuation of GSoC 2024. In GSoC 2024, we successfully integrated Amharic parsers and extractors into the DBpedia chapter. However, due to time constraints, we could not add sufficient mappings to extract data from Wikipedia. This year, we plan to add more new mappings, build a robust landing page for presentation, and clean the existing data.

Goal

The primary goal of this project is to enhance the Amharic DBpedia chapter:

  • Extend the existing Amharic DBpedia chapter in the DBpedia knowledge graph with data from Amharic Wikipedia.
  • Add additional mappings.
  • Extend the DBpedia extraction framework to extract citations, disambiguation, personal data, topical concepts, anchor text, and shared resources from Amharic Wikipedia.
  • Create an automatic extraction framework and mapping.
  • Make the knowledge graph available to end users via a web page.
  • Create documentation for processes, tools, and techniques used for sustainable development, following FAIR principles.

Impact

  • Enable users to access and utilize structured data in Amharic DBpedia more effectively.
  • This will promote linguistic diversity and support research, education, and applications that rely on multilingual knowledge graphs.
  • NLP downstream tasks: Apply knowledge graphs from DBpedia to NLP applications such as machine translation and sentiment analysis.
  • Community engagement: Encourage the community to contribute and collaborate in sustaining and expanding Amharic DBpedia.

Warm-up tasks

Please read the following papers:
GitHub Repository

Skills Required

  • A good understanding of Java and Python
  • Optionally, good knowledge of SPARQL, RDF, and other Semantic Web technologies
  • Good documentation and communication skills

Project Size

350 hours

Mentors

Hizkiel Alemayehu

Tilahun Tafa

Ricardo Usbeck

Keywords

Amharic DBpedia, Semantic Web, Extraction Framework