DBpedia Hindi Chapter — GSoC 2024

tsoru · February 5, 2024, 3:37pm

Description

DBpedia is an open knowledge graph in continuous evolution. Unlike Wikidata, where the RDF content is directly edited as a wiki, DBpedia relies strictly on Wikipedia, meaning that every single triple in DBpedia — except for ontology statements — can be traced back to some infobox, sentence or table cell in Wikipedia.

The graph exposed at the root domain of DBpedia is derived solely from English Wikipedia (e.g. https://dbpedia.org/page/India). Purpose of this project is to create a graph derived solely from Hindi Wikipedia. Methods to generate triples rely on the Extraction Framework for infobox extraction or through novel NLP-based approaches such as the Neural Extraction Framework. Unfortunately, the latter approach only supports the English language. We thus welcome NLP and/or LLM-based solutions to target multilingual text.

Goal

Create the DBpedia Chapter in Hindi language to be reached at hi.dbpedia.org. In particular:

Create the knowledge graph with data from Hindi Wikipedia.
Expose the knowledge graph to make it browsable via web.
Create a SPARQL endpoint to make it queryable.

Material

See Warm-up tasks.

Project size

This project is medium-sized (175 hours).

Impact

Cultural and Educational Enrichment: Empower Hindi-speaking users with culturally relevant and easily accessible knowledge, fostering educational enrichment and linguistic inclusivity.
Semantic Search and NLP Applications: Enable advanced semantic search and natural language processing (NLP) applications in Hindi, opening avenues for innovation in information retrieval and analysis.
Community Engagement: Encourage community contributions, feedback, and collaboration in maintaining and expanding the Hindi ontology, ensuring continuous improvement and relevance.

In summary, this project seeks to contribute significantly to linguistic diversity in the semantic web domain by extending the DBpedia ontology to Hindi, promoting a more inclusive and accessible knowledge landscape
for Hindi-speaking users.

Warm-up tasks

Please read carefully our overview on creating new DBpedia Chapters.
Read the paper Internationalization of Linked Data: The case of the Greek DBpedia edition by Kontokostas et al.
Learn about the DBpedia Extraction Framework, the software used to transform Wikipedia infobox data into RDF triples.
Check the mapping in Hindi of the DBpedia ontology.
Go through the list of current chapters can be found at this address to get an idea of how they are structured.
Get familiar with SPARQL on the DBpedia endpoint.
Run a local DBpedia Virtuoso endpoint.

Mentors

Sanju Tiwari (@tiwarisanju18), Ronak Panchal, TBD

AnanyaD · February 6, 2024, 1:30pm

Hello @tsoru .
I am interested to work in this project! I have recently studied and worked on NLP, and simultaneously looking for a LLM based project.

A recent research project of mine involved workign with exploring LSTM, BERT and usage of vectors of words, for text identification and tokenization whoch could help recognize sensitve information in a real time video.

Would be amazing to collaborate

deba-iitbh · February 6, 2024, 1:49pm

Hi @tsoru @tiwarisanju18 Ronak Panchal,

I am Debarghya Datta(Masters in CS specializing in NLP). During my masters, I worked in Domain-specific Entity Linking with custom KG, and have tackled challenges in using a Neural-based model for the same, as most of the SOTA EL models (GENRE, BLINK) are trained with English Corpus.

Due to my previous experience, i find this project particularly interesting and want to try out how LLMs along with the awesome NLP tools created by the community can be used effectively to solve this problem.

Please let me know if you have any questions!

Github - deba-iitbh (Debarghya Datta) · GitHub
Linkedin - Debarghya Datta - Teaching Assistant - Indian Institute of Technology, Bhilai | LinkedIn

StarfishCode · February 6, 2024, 2:02pm

@tsoru

Hello,

I’m highly interested in the GSOC 2024 project focusing on Semantic Web and NLP for DBpedia. With recent research publications in NLP and hands-on experience, I’m eager to contribute to empowering Hindi-speaking users with culturally relevant knowledge. Excited about the opportunity to collaborate and make a meaningful impact.

Best Regards
Aryan Dwivedi

tsoru · February 6, 2024, 2:06pm

Hi @AnanyaD @deba-iitbh @StarfishCode, and thanks for your interest.

Please direct your queries to @tiwarisanju18 who is the main mentor for this project.

rishitagarwal2404 · February 6, 2024, 2:45pm

Hello @tiwarisanju18, Ronak Panchal,
I am Rishit Agarwal a bachelor’s student and recently worked with web semantics domain involving the use of NLP and using Bert models for various tasks such as classification and learning the reasoning of llms using the rdf synatx.
I was looking for projects collaborating with my interests in NLP tools and web semantics and I believe this project could be helpful for enhancing my understanding of the domain.

Looking forward to the amazing opportunity.

Regards
Rishit Agarwal

Injector_Ash · February 6, 2024, 3:23pm

I am highly interested in this project. I have attached my resume for your reference in this chat. Kindly, go through it to judge my skills and experiences.

Resume
LinkedIN profile

StarfishCode · February 6, 2024, 4:41pm

@tiwarisanju18

arup_chauhan · February 7, 2024, 9:10am

Hello @tiwarisanju18 and fellow mentors, I am doing a Master of Science in Computer Science from Illinois Institute of Technology, Chicago. I am highly interested in this project, I have been part of FOSS Overflow 2024 organised by IIT Bhilai, and this opportunity also aligns with my interest to continue contributing to the impactful open source projects.

For this semester I have taken the elective of Information Retrieval which essentially forms the base of NLP and thus correlates with the fundamentals of this Project.

As an international student currently in the USA, I understand the value of the preservation of vernacular language and culture and am excited to find a way to contribute to my roots.

I can get started with the warmup tasks as soon as possible to showcase my suitability for the project.

yash-srivastava19 · February 7, 2024, 5:58pm

Hi @tiwarisanju18 !!

I am Yash Srivastava. I have worked in NLP and MT previously, and have good experience in working with Indic languages. I really liked this problem, and would be open to work on it. I also have good programming and open source experience.

Know more about me and my stuff : Website , GitHub

Would love to discuss more on this !!

tiwarisanju18 · February 8, 2024, 7:24pm

Dear All

Thank you so much for showing your interest in the GSoC project. Please go through the warmup tasks and read them carefully, we will notify soon.

Thank you

nikkvijay32 · February 9, 2024, 7:32am

@tiwarisanju18 I’m new in community can you assist me where I can find the warmup tasks.

tiwarisanju18 · February 9, 2024, 7:37am

Dear Vijay
Thank You for showing the interest
please read the following links carefully

Warm-up tasks

Please read carefully our overview on creating new DBpedia Chapters .
Read the paper Internationalization of Linked Data: The case of the Greek DBpedia edition by Kontokostas et al.
Learn about the DBpedia Extraction Framework , the software used to transform Wikipedia infobox data into RDF triples.
Check the mapping in Hindi of the DBpedia ontology .
Go through the list of current chapters can be found at this address to get an idea of how they are structured.
Get familiar with SPARQL on the DBpedia endpoint.
Run a local DBpedia Virtuoso endpoint.

kumarshivam · February 18, 2024, 5:50pm

hello @tsoru i am shivam ,im a pre-final student from Guru govind singh indraprastha university(DELHI) pursuing electronics and communication.i am eager to participate in this project.i have good programming language.thank you

tiwarisanju18 · February 21, 2024, 11:10am

Dear @kumarshivam

Thank You for your interest. Please read Warm-up tasks

Warm-up tasks

Please read carefully our overview on creating new DBpedia Chapters .
Read the paper Internationalization of Linked Data: The case of the Greek DBpedia edition by Kontokostas et al.
Learn about the DBpedia Extraction Framework , the software used to transform Wikipedia infobox data into RDF triples.
Check the mapping in Hindi of the DBpedia ontology .
Go through the list of current chapters can be found at this address to get an idea of how they are structured.
Get familiar with SPARQL on the DBpedia endpoint.
Run a local DBpedia Virtuoso endpoint.

tiwarisanju18 · February 23, 2024, 1:50pm

Dear @rishitagarwal2404 @kumarshivam @nikkvijay32 @yash-srivastava19 @arup_chauhan @StarfishCode @Injector_Ash @deba-iitbh @AnanyaD @Ronak Panchal

Please share your proposal in the following format:
http://tommaso-soru.it/files/misc/Akshay-DBpedia-GSoC-2017-proposal.pdf

Please share the proposal as a google doc to: shodhguru21@gmail.com

Timeline: Zeitachse: Google Summer of Code 2024 | Google for Developers

Thank You

StarfishCode · February 23, 2024, 2:31pm

Thank you for the same, I have a small request could you provide more material for reference in terms of technical papers, and references anything helps.

Thank You
Best Regards

tiwarisanju18 · February 23, 2024, 2:33pm

@StarfishCode
Please take some idea.

AnanyaD · February 23, 2024, 5:55pm

@tiwarisanju18

Thank you for the reply. Can we get to know more about the literature review regarding the topic, previous papers for reference and further research?

tiwarisanju18 · February 25, 2024, 12:01pm

Please find here:

Dimitris Kontokostas, Charalampos Bratsas, Sören Auer, Sebastian Hellmann, Ioannis Antoniou, George Metakides, Internationalization of Linked Data: The case of the Greek DBpedia edition, Web Semantics: Science, Services and Agents on the World Wide Web, Volume 15, September 2012, Pages 51–61, ISSN 1570–8268, 10.1016/j.websem.2012.01.001.