DBpedia Hindi Chapter — GSoC 2024

ananyaiitbhilai · February 25, 2024, 6:28pm

Hi @rishitagarwal2404 @kumarshivam @nikkvijay32 @yash-srivastava19 @arup_chauhan @StarfishCode @Injector_Ash @deba-iitbh @AnanyaD

I shall be a junior mentor for this project. I’m here to help you and clear queries you have about the project. Please deep dive into the Problem statement, get familiar with the Extraction Framework, and play around with the Dbpedia tools at your disposal.

There’s a bunch of warm-up tasks, research papers in the already described Problem statement to kickstart your journey.

Remember, the GSoC proposal deadline is April 2nd, but why wait? Get a draft over to me at ananyahooda04@gmail.com, and at shodhguru21@gmail.com (comment access, please!). Here is an example of an excellent proposal for your reference. Hoping to see some novel methodologies involving NLP or LLMs or something out of the box to go about the Problem.

Cheers!
Ananya

kumarshivam · February 26, 2024, 1:32pm

@ananyaiitbhilai thankyou for giving these valuable informations.

nikkvijay32 · February 27, 2024, 6:46am

@ananyaiitbhilai “Thank you so much, Ananya! Really looking forward to diving into the project and learning from your guidance.”

nikkvijay32 · February 27, 2024, 8:04am

Hi mentors,

DBpedia Chapter in Hindi language project. This initiative is an exciting opportunity to bridge the gap in knowledge accessibility for Hindi-speaking communities and to enrich the global semantic web with linguistic diversity.

1.Understanding Ontology Usage: Analyze classes and properties in Hindi Wikipedia using statistical methods to identify key ontological elements.
2.Implementing Autocompletion: Develop a feature suggesting relevant ontology elements based on statistical usage data to facilitate query construction and exploration.
3.Exploring Machine Learning: Apply machine learning to improve prediction accuracy for ontology element recommendations, exploring existing research for applicable models.
4.Comparative Analysis and Optimization: Evaluate and refine both statistical and machine learning approaches to optimize the autocompletion feature.
5.Deployment and Integration: Integrate the optimal solution into the Archivo web service for seamless use within the DBpedia ecosystem, enhancing user experience and community engagement.

This is a preliminary idea based on my present understanding. Please feel free to correct any misconceptions and suggest additions or alterations to the concept. Thank you in advance!

Best regards,
Nikhil

tiwarisanju18 · March 12, 2024, 2:33pm

Dear @nikkvijay32

Thank you for exploring it.
Please include it in to the proposal and send it.

Thank You

ananyaiitbhilai · March 19, 2024, 6:27am

Hi @rishitagarwal2404 @kumarshivam @nikkvijay32 @yash-srivastava19 @arup_chauhan @StarfishCode @Injector_Ash @deba-iitbh @AnanyaD

As the Application submission period has commenced, it would be nice if you can share your proposal drafts and complete the warm-up tasks. The early you share, it would be better to get suggestions from the mentors. Detail-oriented proposals are appreciated.

Some more instructions: Click Here

kumarshivam · March 22, 2024, 1:40pm

Hii @ananyaiitbhilai ,I have sent my proposal.kindly review it so, I can improve further ):

ananyaiitbhilai · March 23, 2024, 1:40am

Hi @kumarshivam I didn’t receive your proposal, please share it as a google doc on my mail id ananyahooda04@gmail.com

kumarshivam · March 23, 2024, 3:47am

Hi @ananyaiitbhilai ,I have re-sent the email, complete with the attached proposal.

Shashanx · March 29, 2024, 7:37am

Hey I’ve read all the previous responses, and now that I’m more informed about GSoC, I want to participate in this forum. I’m a Hindi speaker, and I’ve thoroughly read everything about the warm-up. I can send my proposal by tomorrow. Could you guys please let me know if I’m eligible? @ananyaiitbhilai @tiwarisanju18

Thankyou
Shashank jha.

Shashanx · March 29, 2024, 4:24pm

@ananyaiitbhilai @tiwarisanju18 Dear mentors I have sent my proposal to your given Email
Kindly revert back if needed to add something to it …
THANKS YOU
Shashank jha

tiwarisanju18 · March 29, 2024, 4:41pm

Dear Shashank
We have received your proposal, you will be notified son.

Thank You

tiwarisanju18 · March 29, 2024, 4:45pm

Hi @rishitagarwal2404 @kumarshivam @nikkvijay32 @yash-srivastava19 @arup_chauhan @StarfishCode @Injector_Ash @AnanyaD @ananyaiitbhilai
[/quote]

Dear All

Please send your proposal on shodhguru21@gmail.com or tiwarisanju18@ieee.org as I have received a few proposals.

Thank You

kumarshivam · March 29, 2024, 5:17pm

@tiwarisanju18 sir I have sent my proposal to you

ananyaiitbhilai · March 31, 2024, 5:22am

Hi,
Following are some common mistakes in the proposals:

Lack of clarity of problem statement
Not gone through warm-up tasks and Neural Extraction Framework project
Timeline section is not organized properly. We expect you to have completed the warm up tasks and getting acquainted with the project while you were writing the proposal (before starting the project).

Please focus on background work, methodology, your understanding of the problem statement section more.

ananyaiitbhilai · April 1, 2024, 11:03am

Hi! @rishitagarwal2404 @kumarshivam @nikkvijay32 @yash-srivastava19 @arup_chauhan @StarfishCode @Injector_Ash @deba-iitbh @AnanyaD @Shashanx

Tomorrow is the deadline for the Proposal submission.

Apart from the previous advise, here is some general advise before submitting the proposal on the portal:

Please use the previously shared proposal as reference
Please check the grammar and spellings along with the hyperlinks
Clearly outline your project plan, feasibility, and potential challenges, aligning with the overall idea. Consider including a mock implementation for bonus points.
Allocate the final week of the coding period for buffer time and project summary.
Employ assertive language, e.g., “The (task) will be completed by (date),” instead of “I will do this.”
Demonstrate your research and understanding by identifying potential challenges.
Please include how many hours weekly will you be committing to this project

You can expect the priorities to be given the sections in the order:

Methodology (where you describe your solution).
Background work (mock implementation of code using a tool, gone through warm-up tasks, found some interesting research papers related to our problem statement)
Your own understanding of the Problem statement described
The timeline described
Your motivation for joining this project
Previous relevant experience
Some other sections
Your availability during the summers

We expect you to have completed the warm up tasks, the relevant tools and frameworks and getting acquainted with the project while you were writing the proposal (before starting the project).
Please explicitly mention your methodology for the given 3 goals in the problem statement.

It’s always good to send proposal to mentors before submitting on the official portal for feedback. However if you plan to submit directly on the official portal due to lack of time, you are welcome to do so.
We received some emails from the applicants directly without any introduction on the discussion forum. It’s always good to introduce yourself on the discussion forum first and ask questions on the common forum since this is an open-source community. Though proposal feedbacks can be discussed personally.

A good starting point would have been going through neural extraction framework project. It would have helped you understand the problem statement (Goal 1) of this project in a better way and provided you with some insights on framing the solution/methodology part. The folder called GSOC-2022 would help you understand the problem statement and would help you get acquaint with a significant part of generating triple. In case you have deep-dived in the project folder you would have come across a tool which is now also available for multilingual text. In the GSOC-2023 folder you will find the end2end pipeline with other significant parts like entity-linking, etc. You might have to find a tool on your own for multilingual entity-linking and some other things. This is one of the solution. But a better solution might exist by using LLMs (you might have to go through some recent research papers). This is a hint for the solution for the Goal 1. Similarly for the other Goals (2 and 3), we expect you to give us system architecture details. You might need to go through the architecture of the current DBpedia and working of the endpoints. The first goal can be said to be more research and methodology oriented and 2,3rd goals of project is more related to system design, system implementation and deployment. For system architecture details, it would be nice to include diagram and write in a point-wise manner. Please avoid long paragraphs for Goal 2 and 3 methodology part.

Please explicitly mention in your proposal if you will extract triples from the whole wikipedia page or currently only planning to focus on the infoboxes. The latter approach would be requiring you to go through the current DBpedia’s pipeline, and will focus only on replicating the English version of DBpedia to Hindi. However, it would be excellent to having all the triples extracted from the text and adding it to the Hindi version of DBpedia and deploying that end2end Hindi extraction framework in the Hindi DBpedia. It really would be a great addition.

Please do last minute refinements based on the advise on the discussion-forum and the advise you have received from us personally on emails or elsewhere if you emailed your proposal.

All the best
Long-Hail Open Source!
Ananya