DBpedia Amharic Chapter — GSoC 2024

Towards Amharic DBpedia

Description

DBpedia is a collaborative initiative focused on extracting structured information from Wikipedia and presenting it as Linked Open Data. While semantic web resourceful languages like English and German have dedicated DBpedia chapters, there must be more representation of low-resourced languages like Amharic. Amharic - an African language - is the official language of Ethiopia, spoken by millions globally, and it is one such language that lacks its own DBpedia chapter. This project endeavors to create an Amharic DBpedia Chapter, aiming to be the first sub-Saharan African language to join the internationalization efforts of DBpedia. This project will pave the way for other African languages to be part of DBpedia. Therefore, the task is effectively extracting, processing, and integrating information from Amharic Wikipedia into DBpedia.

Goal

The primary goal of this project is to create an Amharic DBpedia chapter to be reached at am.dbpedia.org:

  • Create an Amharic DBpedia chapter in the DBpedia knowledge graph with data from Amharic Wikipedia.
  • Extend the DBpedia extraction framework to extract citations, disambiguation, personal data, topical concepts, anchor text, and shared resources from Amharic Wikipedia.
  • Create Amharic DBpedia mapping based on DBpedia ontology mapping guidelines.
  • Make the knowledge graph available to end users via a web page.
  • Create a SPARQL endpoint to make it queryable.
  • Create a document for processes, tools, and techniques used for sustainable development following FAIR principles.

Impact

  • Enabling users to access and utilize structured data in Amharic DBpedia more effectively.
  • Promote linguistic diversity and support research, education, and applications that rely on multilingual knowledge graphs.
  • NLP downstream tasks: Apply knowledge graphs from DBpedia to downstream NLP tasks such as machine translation and sentiment analysis.
  • Community Engagement: Encourage the community to contribute and collaborate in sustaining and expanding Amharic DBpedia.

Warm-up tasks

Please read the following papers:

  • Amharic Wikipedia
  • Arabic DBpedia
  • Korean DBpedia
  • German DBpedia

Skills Required

  • A good understanding of Java, Python
  • Optionally, good knowledge of SPARQL, RDF, and other Semantic Web technologies
  • Good documentation and communication skills

Project Size

350 hrs

Mentors

  • Hizkiel Alemayehu
  • Tilahun Tafa
  • Ricardo Usbeck

Keywords

Amharic DBpedia, Semantic Web, Extraction Framework,

2 Likes

Hi @hizclick,

Are you looking for someone who is native to Amharic or anyone interested in this can contribute?

1 Like

HI @wasifferoze,
Would you kindly take a moment to look at GitHub - dice-group/Amharic_DBpedia_Chapter for a quick warm-up?"

1 Like

Hey @hizclick ,
I am really interested in this chapter and I’m currently reading the paper on Korean DBpedia listed on Github. Are there any warm-up issues or contributions that I can work on after going through the warm-up resources?

Hi Team, I have been reading on the links on GitHub (GitHub - dice-group/Amharic_DBpedia_Chapter). I am interested to be part of this project.

I have finished reviewing the provided resources and understood the workflow to create ontologies from Wikipedia for DBpedia. My understanding is it would be a mix of automatic and manual extraction for Amharic Dbpedia. I would love to contribute to this project and look forward to your further communication.

Hi @sumedh @Abenezer @wasifferoze thank you for showing interest. We have included additional details on the GitHub repository (GitHub - dice-group/Amharic_DBpedia_Chapter).
Please start preparing your drafts on a Google Doc. Once you’re satisfied with your draft, kindly share it at hizclick@gmail.com.

1 Like

Hi @hizclick I am also interested. I will be working on the todo’s from the Github link and ask questions here if i have any.

Hi @hizclick , I have experience with knowledge graphs and identity graph resolution , as I have implemented a data pipeline to extract , process , ingest and analyse the data, for graph db. I have knowledge of Java as well as Python, also my understanding with SPARQL, RDF and gremlin will be helpful.

Hi @Abel, @gh0St thank you for your interest. Let us know if you have questions. For now, you can start working on preparing the proposal

1 Like

Hi @hizclick, is it possible to share the proposal early so that we get it reviewed before submitting it on March 18? If yes, please share with me your email which I can use to invite in Google Docs for commenting.

Hi @Abenezer, definitely. Please share the Google docs to hizclick@gmail.com

I have completed reviewing the provided resources as a quick warm-up and have grasped the workflow effectively. I am eager to contribute to the project. You can reach me via email at abatejemal@gmail.com.