Data Quality Dashboard - GSoC2020

DBpedia offers large quantities of structured data. Though, DBpedia has partly insufficient data quality which originate from different sources, e.g. incorrect extractions and value transformations in the extraction framework, inconsistent mappings, incorrect data in Wikipedia articles, and generally incompleteness.

Visualize a set of metrics in an easy to read interactive UI that facilitates the decision on what should be fixed next in DBpedia.

The interface will help DBpedia contributors to adopt a “data quality first” attitude, enable data-driven prioritization of development tasks.

This idea was already proposed in last year’s gsoc. In my opinion, it could be really useful to develop a Data Quality dashboard for DBpedia. What do you think about this proposal?

To clarify, did you propose it last year? I think @karankharecha proposed one. We would probably mentor two or three DQ dashboards this year.

The topic is highly difficult though. DQ is a measure of fitness to use. So actually a user can only evaluate DQ. So per se it is impossible to build something semantic here.

What makes sense here is to:

@lucav48 this is you?
If you have a background in Network analysis/AI this can help a lot. But you need to commit to a specific task.

Hi @kurzum! This is not my idea, but last year I saw some proposals for building a Data Quality Dashboard but none of them were accepted. I thought that it could be a nice to try to propose again this idea!

However that is me :grin: I am willing to be a mentor for this project, but surely we should define better this idea.

ok, cool, welcome.
I tagged you as GSOC Mentor. @SandraPraetor @tsoru. We can work together on this. I am talking to Kharan on Friday, who proposed it last year. He did quite good things: see Interactive Dashboard for Datasets and

@lucav48 I merged some of your ideas into Dashboard for Language/National Knowledge Graphs
Data quality is a super hard topic and this complexity is multiplied by managing errors and issues. I focused the other idea on visualisation with the option to pick some more features like Data Quality. If anybody does Data quality then focus on one aspect is important, i.e. no generic solutions.

Thank you @kurzum! I’d be glad to contribute to this project.