DBpedia Hindi Chapter — GSoC 2024

Description

DBpedia is an open knowledge graph in continuous evolution. Unlike Wikidata, where the RDF content is directly edited as a wiki, DBpedia relies strictly on Wikipedia, meaning that every single triple in DBpedia — except for ontology statements — can be traced back to some infobox, sentence or table cell in Wikipedia.

The graph exposed at the root domain of DBpedia is derived solely from English Wikipedia (e.g. https://dbpedia.org/page/India). Purpose of this project is to create a graph derived solely from Hindi Wikipedia. Methods to generate triples rely on the Extraction Framework for infobox extraction or through novel NLP-based approaches such as the Neural Extraction Framework. Unfortunately, the latter approach only supports the English language. We thus welcome NLP and/or LLM-based solutions to target multilingual text.

Goal

Create the DBpedia Chapter in Hindi language to be reached at hi.dbpedia.org. In particular:

  • Create the knowledge graph with data from Hindi Wikipedia.
  • Expose the knowledge graph to make it browsable via web.
  • Create a SPARQL endpoint to make it queryable.

Material

See Warm-up tasks.

Project size

This project is medium-sized (175 hours).

Impact

  • Cultural and Educational Enrichment: Empower Hindi-speaking users with culturally relevant and easily accessible knowledge, fostering educational enrichment and linguistic inclusivity.
  • Semantic Search and NLP Applications: Enable advanced semantic search and natural language processing (NLP) applications in Hindi, opening avenues for innovation in information retrieval and analysis.
  • Community Engagement: Encourage community contributions, feedback, and collaboration in maintaining and expanding the Hindi ontology, ensuring continuous improvement and relevance.

In summary, this project seeks to contribute significantly to linguistic diversity in the semantic web domain by extending the DBpedia ontology to Hindi, promoting a more inclusive and accessible knowledge landscape
for Hindi-speaking users.

Warm-up tasks

  1. Please read carefully our overview on creating new DBpedia Chapters.
  2. Read the paper Internationalization of Linked Data: The case of the Greek DBpedia edition by Kontokostas et al.
  3. Learn about the DBpedia Extraction Framework, the software used to transform Wikipedia infobox data into RDF triples.
  4. Check the mapping in Hindi of the DBpedia ontology.
  5. Go through the list of current chapters can be found at this address to get an idea of how they are structured.
  6. Get familiar with SPARQL on the DBpedia endpoint.
  7. Run a local DBpedia Virtuoso endpoint.

Mentors

Sanju Tiwari (@tiwarisanju18), Ronak Panchal, TBD

3 Likes

Hello @tsoru .
I am interested to work in this project! I have recently studied and worked on NLP, and simultaneously looking for a LLM based project.

A recent research project of mine involved workign with exploring LSTM, BERT and usage of vectors of words, for text identification and tokenization whoch could help recognize sensitve information in a real time video.

Would be amazing to collaborate

Hi @tsoru @tiwarisanju18 Ronak Panchal,

I am Debarghya Datta(Masters in CS specializing in NLP). During my masters, I worked in Domain-specific Entity Linking with custom KG, and have tackled challenges in using a Neural-based model for the same, as most of the SOTA EL models (GENRE, BLINK) are trained with English Corpus.

Due to my previous experience, i find this project particularly interesting and want to try out how LLMs along with the awesome NLP tools created by the community can be used effectively to solve this problem.

Please let me know if you have any questions!

Github - deba-iitbh (Debarghya Datta) · GitHub
Linkedin - Debarghya Datta - Teaching Assistant - Indian Institute of Technology, Bhilai | LinkedIn

@tsoru

Hello,

I’m highly interested in the GSOC 2024 project focusing on Semantic Web and NLP for DBpedia. With recent research publications in NLP and hands-on experience, I’m eager to contribute to empowering Hindi-speaking users with culturally relevant knowledge. Excited about the opportunity to collaborate and make a meaningful impact.

Best Regards
Aryan Dwivedi

Hi @AnanyaD @deba-iitbh @StarfishCode, and thanks for your interest.

Please direct your queries to @tiwarisanju18 who is the main mentor for this project.

Hello @tiwarisanju18, Ronak Panchal,
I am Rishit Agarwal a bachelor’s student and recently worked with web semantics domain involving the use of NLP and using Bert models for various tasks such as classification and learning the reasoning of llms using the rdf synatx.
I was looking for projects collaborating with my interests in NLP tools and web semantics and I believe this project could be helpful for enhancing my understanding of the domain.

Looking forward to the amazing opportunity.

Regards
Rishit Agarwal

I am highly interested in this project. I have attached my resume for your reference in this chat. Kindly, go through it to judge my skills and experiences.

Resume
LinkedIN profile

@tiwarisanju18

Hello @tiwarisanju18 and fellow mentors, I am doing a Master of Science in Computer Science from Illinois Institute of Technology, Chicago. I am highly interested in this project, I have been part of FOSS Overflow 2024 organised by IIT Bhilai, and this opportunity also aligns with my interest to continue contributing to the impactful open source projects.

For this semester I have taken the elective of Information Retrieval which essentially forms the base of NLP and thus correlates with the fundamentals of this Project.

As an international student currently in the USA, I understand the value of the preservation of vernacular language and culture and am excited to find a way to contribute to my roots.

I can get started with the warmup tasks as soon as possible to showcase my suitability for the project.

Hi @tiwarisanju18 !!

I am Yash Srivastava. I have worked in NLP and MT previously, and have good experience in working with Indic languages. I really liked this problem, and would be open to work on it. I also have good programming and open source experience.

Know more about me and my stuff : Website , GitHub

Would love to discuss more on this !!

Dear All

Thank you so much for showing your interest in the GSoC project. Please go through the warmup tasks and read them carefully, we will notify soon.

Thank you

@tiwarisanju18 I’m new in community can you assist me where I can find the warmup tasks.

Dear Vijay
Thank You for showing the interest
please read the following links carefully

Warm-up tasks

  1. Please read carefully our overview on creating new DBpedia Chapters .
  2. Read the paper Internationalization of Linked Data: The case of the Greek DBpedia edition by Kontokostas et al.
  3. Learn about the DBpedia Extraction Framework , the software used to transform Wikipedia infobox data into RDF triples.
  4. Check the mapping in Hindi of the DBpedia ontology .
  5. Go through the list of current chapters can be found at this address to get an idea of how they are structured.
  6. Get familiar with SPARQL on the DBpedia endpoint.
  7. Run a local DBpedia Virtuoso endpoint.

hello @tsoru i am shivam ,im a pre-final student from Guru govind singh indraprastha university(DELHI) pursuing electronics and communication.i am eager to participate in this project.i have good programming language.thank you