Interactive Dashboard for Datasets

This feature is about building a Interactive Dashboard for the datasets that arrives every month.
As going through the whole dataset every month is quite time consuming, but summarizing it on the dashboard would give proper idea to the users what all things are included in the newly added dataset or datasets. Summarizing datasets and interacting with dashboard would provide users to instantly get the datasets they need by applying filters.

Interactive Dashboard would consist of several topics that are covered in the dataset (which is added every month), size of the dataset, quick links from which users can directly download data that they are interested in.

For reference, how interactive dashboard (the one I’m proposing) looks like is shown below:
https://olympichistory.azurewebsites.net/

Above is the sample dashboard that I designed targeting particular dataset, it has the functionality of sliders with which users can visualize the graphs as per their needs, also multiple options drop down menu for filtering particular graphs. This dashboard is just a sample for summarizing the datasets.

Hi, I moved this project idea to the main category ‘Projects’.

@karankharecha also in reply to Recommendation System for Databus and also your chatbot proposal for GSoC

In principle, these are the right ideas. We were unable to progress on this before as important meta information about the data was not available. We finished the essentials though and now is a better time to resume work on interfaces.

Let me sum what there is currently:

  • Monthly releases are stable and the Databus seems to be doing fine. The main idea here is that people publish any data, e.g. their own or the one they extracted from somewhere else such as DBpedia under their accounts. Good quality data gets picked up and fused into global.dbpedia.org
  • While the basics cover files and users, we did not finish implementing additional statistics, so called Mods which are essentially third party analysis plugins for the databus. These would also have the role of semantic tagging and content indexing
  • Mods are the systematic approach, however, we finished a preform of semantic indexing, which we do manually now and later automatically. This can be seen e.g. here where in the next version, we will also include the dataset reference, see another prototype deployment here with DNB, Musicbrainz, Geonames, etc. The basis for this is called PreFusion and data is here
  • the prefusion is an aggregation of several datasets and it is partitioned by properties into files. So if you are talking about a dashboard, the user might not be interested in all or any Databus datasets, but he would probably configure which part of the prefusion she would like to receive in terms of:
    – Subjects, i.e. all persons, companies and cities
    – properties for the above selected
    – maybe the export vocabulary, i.e. export dbo:birthdate as foaf:birthdate or wd:Pxxx
  • in addition, we would also encourage users to add more datasets to the prefusion, add/fix links or map additional vocabularies, But this can also be linked later. Getting the information out there is the most important goal now.

Besides the interactive dashboard we can also make a visualisation first.

Side note the previous version of prefusion that you are seeing is loaded here: https://github.com/dbpedia/gfs/tree/master/gfs-data-browser into a read only mongodb:

    mongo_url: "mongodb://readonly:gfs@88.99.242.78:8989/prefusion",

the second prototype has the newer data.

@kurzum
As discussed in the thread of Tasks for Volunteers regarding the video meeting with mentors, can we have discussion on this topic? Earlier you said there were lot of remodeling in the projects. So, in the meeting if we can have the discussion on connecting the current project model with these three (dashboard, recommendation system and may be chatbot) then we can get started with the code base. I’m not much familiar with the current architecture of the project and I believe meeting would give me more clear idea about introducing new features.

@karankharecha What about Friday 10 am German time, 15:30 Indian time? You are in the Indian Time Zone IIRC.

@kurzum
Yes, sure.
10 am German time is perfect for me.
In-case the meeting is to be done on Skype, my SkypeID: karankharecha

Ok, if anybody else wants to join, they need to message me.

@karankharecha I put some thoughts into what we really need. It prioritizes displaying the DBpedia data.

Chapter Viewer: DBpedia has many chapters, i.e. the Dutch, Spanish, etc. It would be awesome to have a dashboard, that takes a lang=nl parameter and then shows: the health of the language-specific data, i.e. how it grows each month, how well the mappings are, guides where to edit, shows online stats of the endpoint like http://nl.dbpedia.org/sparql and also shows the collection of what is loaded. I think this would be a very good start. Then it could also help browser other datasets related to the national chapters. The Polish chapter has some census data for example.
I think this will be a perfect start.

@kurzum
Hello, I have settled now with reliable internet connecion so can carry on the work that we discussed.
The idea of health status of data that you have mentioned is something that I would like to start working on.
Can we fix a meeting on Skype to discuss it? So that I can plan how to design the dashboard.

@karankharecha yes, we can. Please suggest a time.

@kurzum Will 11am (Germany time) on Saturday work for you?

@kurzum is it possible for you tomorrow?
31st Jan, Friday, 10am (Germany time).