DBpedia Evolution Project - GSoC2020

DESCRIPTION
DBpedia plans to release data much more frequently in the near future using a new improved release pipeline. As work on the extraction framework continues, the content of the released data will change over time. In order to ensure consistency and completeness of new dataset releases, verification measures have to be designed and implemented. This starts with simple checks over file name and sizes and goes on to more complicated tasks such as tracking and documenting the path of triples through the DBpedia extraction framework or logging the impact of specific mapping changes.

Goal
The goal of this task is to improve data quality of the continously released DBpedia datasets by detecting erroneous changes in the extraction and release process. The result is a verification pipeline that compares the results and processes of previous and upcoming DBpedia releases.
Warm-up tasks:
Download the latest DBpedia release (2016-10) and the current pre-release (2019-08-30) and implement a simple check over file names and sizes to verify the completeness of the pre-release. The process should log any files of the 2016-10 release that could not be matched to a file in the pre-release. Matched files should be compared by size and should be similar in size (at most 80% smaller or 200% larger).
Mentors
Jan Forberg
Keywords
Data Quality

I am a second year student at jamia millia islamia, India and am interested in pursuing this project for gsoc 2020 . i an currently in solving the warm up task please instruct on how to download the current pre release as running the query shows unable to find collection . also which files from the 2016-10 release should i compare against.

Hi, this collection should work: https://databus.dbpedia.org/dbpedia/collections/pre-release-2019-08-30/
You can try this download client to get the files: https://hub.docker.com/repository/docker/dbpedia/minimal-download-client.

Hello,
This sounds like a very interesting project, would love to participate in it for the GSOC 2020 initiative. I am a Masters student in Software Engineering, at IIIT Allahabad. I have done one project which is based on Dbpedia and semantic web. So i am very interested in this project. So kindly give me the initial instructions to solve the warm up tasks.

Thank you