Building the Amharic DBpedia Language Chapter with Large Language Models (LLMs)

Building the Amharic DBpedia Language Chapter with Large Language Models (LLMs)

Description

DBpedia is a collaborative initiative that extracts structured information from Wikipedia and publishes it as Linked Open Data. This is a continuation of GSoC 2024 and GSoC 2025. We successfully integrated Amharic parsers and extractors into the DBpedia chapter. However, due to time constraints, we could not build a complete automation system to extract and build the artifacts. In this year’s GSOC, we would like to continue from last year’s progress.

Goal

The primary goal of this project is to enhance the existing Amharic DBpedia chapter:

  • Integrate an automatic extraction framework and mapping by applying LLMs
    • Class/Property/Relation prediction
  • Build a demo page
  • Update the home page
  • Deploy the knowledge graph available to end users via a web page.
  • Create documentation for processes, tools, and techniques used for sustainable development, following FAIR principles.

Impact

  • Enable users to access and utilize structured data in Amharic DBpedia more effectively.
  • This will promote linguistic diversity and support research, education, and applications that rely on multilingual knowledge graphs.
  • NLP downstream tasks: Apply knowledge graphs from DBpedia to NLP applications such as machine translation and sentiment analysis.
  • Community engagement: Encourage the community to contribute and collaborate to sustain and expand Amharic DBpedia.

Warmup Tasks
Read the documentation for Amharic DBpedia at
https://github.com/AmharicDBpedia/AmharicDBpediaChapter/wiki
Amharic Wikipedia

Skills Required

  • A good understanding of Java and Python
  • Optionally, good knowledge of SPARQL, RDF, and other Semantic Web technologies
  • Machine Learning
  • Good documentation and communication skills

Project Size

350 hours

Mentors

Hizkiel Alemayehu

Tilahun Tafa

Ricardo Usbeck

Andargachew Asfaw

Keywords

Amharic DBpedia, Semantic Web, Extraction Framework

1 Like

Hiiii @hizclick and mentors,

I’ve been going through the warmup tasks - read through the AmharicDBpedia wiki, explored the Amharic Wikipedia structure, and checked out the Arabic, Korean, and German DBpedia chapters to understand how other language editions handle similar challenges.

Really interested in working on the LLM-based extraction and mapping automation this year. The continuation from GSoC 2024/2025 makes a lot of sense.

Could you please advise on where you prefer a pre-proposal draft to be shared, here on the forum or via email, so I can align my approach with your expectations?

Thanks!