DBpedia is an open knowledge graph in continuous evolution. Unlike Wikidata, where the RDF content is directly edited as a wiki, DBpedia relies strictly on Wikipedia, meaning that every single triple in DBpedia — except for ontology statements — can be traced back to some infobox, sentence or table cell in Wikipedia.
The graph exposed at the root domain of DBpedia is derived solely from English Wikipedia (e.g. https://dbpedia.org/page/India). Purpose of this project is to create a graph derived solely from Hindi Wikipedia. Methods to generate triples rely on the Extraction Framework for infobox extraction or through novel NLP-based approaches such as the Neural Extraction Framework. Unfortunately, the latter approach only supports the English language. We thus welcome NLP and/or LLM-based solutions to target multilingual text.
Goal
Create the DBpedia Chapter in Hindi language to be reached at hi.dbpedia.org. In particular:
Expose the knowledge graph to make it browsable via web.
Create a SPARQL endpoint to make it queryable.
Material
See Warm-up tasks.
Project size
This project is medium-sized (175 hours).
Impact
Cultural and Educational Enrichment: Empower Hindi-speaking users with culturally relevant and easily accessible knowledge, fostering educational enrichment and linguistic inclusivity.
Semantic Search and NLP Applications: Enable advanced semantic search and natural language processing (NLP) applications in Hindi, opening avenues for innovation in information retrieval and analysis.
Community Engagement: Encourage community contributions, feedback, and collaboration in maintaining and expanding the Hindi ontology, ensuring continuous improvement and relevance.
In summary, this project seeks to contribute significantly to linguistic diversity in the semantic web domain by extending the DBpedia ontology to Hindi, promoting a more inclusive and accessible knowledge landscape
for Hindi-speaking users.
Hello @tsoru .
I am interested to work in this project! I have recently studied and worked on NLP, and simultaneously looking for a LLM based project.
A recent research project of mine involved workign with exploring LSTM, BERT and usage of vectors of words, for text identification and tokenization whoch could help recognize sensitve information in a real time video.
I am Debarghya Datta(Masters in CS specializing in NLP). During my masters, I worked in Domain-specific Entity Linking with custom KG, and have tackled challenges in using a Neural-based model for the same, as most of the SOTA EL models (GENRE, BLINK) are trained with English Corpus.
Due to my previous experience, i find this project particularly interesting and want to try out how LLMs along with the awesome NLP tools created by the community can be used effectively to solve this problem.
I’m highly interested in the GSOC 2024 project focusing on Semantic Web and NLP for DBpedia. With recent research publications in NLP and hands-on experience, I’m eager to contribute to empowering Hindi-speaking users with culturally relevant knowledge. Excited about the opportunity to collaborate and make a meaningful impact.
Hello @tiwarisanju18, Ronak Panchal,
I am Rishit Agarwal a bachelor’s student and recently worked with web semantics domain involving the use of NLP and using Bert models for various tasks such as classification and learning the reasoning of llms using the rdf synatx.
I was looking for projects collaborating with my interests in NLP tools and web semantics and I believe this project could be helpful for enhancing my understanding of the domain.
I am highly interested in this project. I have attached my resume for your reference in this chat. Kindly, go through it to judge my skills and experiences.
Hello @tiwarisanju18 and fellow mentors, I am doing a Master of Science in Computer Science from Illinois Institute of Technology, Chicago. I am highly interested in this project, I have been part of FOSS Overflow 2024 organised by IIT Bhilai, and this opportunity also aligns with my interest to continue contributing to the impactful open source projects.
For this semester I have taken the elective of Information Retrieval which essentially forms the base of NLP and thus correlates with the fundamentals of this Project.
As an international student currently in the USA, I understand the value of the preservation of vernacular language and culture and am excited to find a way to contribute to my roots.
I can get started with the warmup tasks as soon as possible to showcase my suitability for the project.
I am Yash Srivastava. I have worked in NLP and MT previously, and have good experience in working with Indic languages. I really liked this problem, and would be open to work on it. I also have good programming and open source experience.
hello @tsoru i am shivam ,im a pre-final student from Guru govind singh indraprastha university(DELHI) pursuing electronics and communication.i am eager to participate in this project.i have good programming language.thank you
Thank you for the same, I have a small request could you provide more material for reference in terms of technical papers, and references anything helps.
Dimitris Kontokostas, Charalampos Bratsas, Sören Auer, Sebastian Hellmann, Ioannis Antoniou, George Metakides, Internationalization of Linked Data: The case of the Greek DBpedia edition, Web Semantics: Science, Services and Agents on the World Wide Web, Volume 15, September 2012, Pages 51–61, ISSN 1570–8268, 10.1016/j.websem.2012.01.001.