I talked to the underlay contact and his answer was this:
Talking to Joel, architect of the current protocols: the Underlay libraries and interfaces may not be ready for this by the summer.
We are just starting to hire someone to help build out our first central registry and related packaging tools. Do you think that creating a toolchain that lets a subcommunity create a databus package, and learning to use IPFS interfaces to store those packages on IPFS, might be sufficient for a full summer’s project? That’s already learning two different languages and testing with a data community
So I guess, it doesn’t mke sense to go for Underlay, yet, although it is very RDF specific. Meanwhile I had a look at IPFS some more and I think integrating Databus and IPFS would be a really cool project. So there are two sides of this:
At the moment, we are using Maven features extensively for publication. This includes either copying files to Apache Web Server or NGINX to
/var/www or lately we have been using Maven WebDav Wagon . So here the process is
- people run the upload the script on the same server, copying to
- people run on one server/laptop and push to publication server via webdav or ssh
So a question here is how to get local data into IPFS. Maybe an IPFS client needs to run or maybe it is just simple file copying. Then also you need to get the IPFS hash, as this needs to go into the dataid and on the Databus.
As far as I understood there are several ways to download IFPS files. There are some download clients in different implementations and normally you are supposed to share the files again. There is also a scala wrapper or a wget-like implementation. I didn’t get very deep into it, but these features should be possible:
- download without having a local IPFS node
- download and share via a local IPFS node
(Note for us it wouldn’t be a problem to host a
- subscribe to new versions, i.e. we have collections such as https://databus.dbpedia.org/dbpedia/collections/latest-core where users can get the latest version of an artifact. (The collection resolves to a SPARQL query:
curl -H "Accept: text/sparql" https://databus.dbpedia.org/dbpedia/collections/latest-core ). I am not sure how IPFS reacts to file changes or updates, but combined with the Databus/Maven-like structure, it should be easy.
- there is the feature of pinning for IPFS clusters https://cluster.ipfs.io/ which would be interesting as we could have a LOD-Cloud Cluster Swarm, which would start backing up the whole LOD Cloud. This would be a killer feature.
I think the skeleton of your work plan is realistic. Before Milestone 1 (implementation with JUNIT test), I would recommend to have an earlier miestone with some sort of hacky, vertical prototype in order to gain experience with IPFS and also to have something working with a small dataset. This follows agile, rapid-prototyping and it is a good milestone to discuss and adjust the following tasks & timeline. Otherwise you spend 4 weeks implementation and testing just to find out that requirements are actually different (happens to almost 50% of all software projects).