Archives and Inter-Planetary Linked Data: The Perfect Fit

Image by Pete Linforth from Pixabay

Decentralised, Peer-To-Peer, Blockchain-Based Academic Archiving & Publishing Ecosystem

We’re well underway with our development of KnowledgeArc Network, the decentralised, peer-to-peer, blockchain-based academic archiving and publishing ecosystem.

Latest Decentralised Technologies

Employing the latest decentralised technologies such as Inter-Planetary File System (IPFS), OrbitDB and Ethereum, we’re solving the problem of needing truly permanent repository solutions—permanent, immutable and censorship-resistant.

The Inter-Planetary Linked Data (IPLD) model provides us with a new way to describe information on KnowledgeArc Network along with the metadata that describes the information that’s being stored.

IPLD is built on top of IPFS, a decentralised peer-to-peer file storage system.

Location-Based vs. Content-Based Addressing

The brilliance of IPFS comes from the way it references what it stores: instead of the traditional method of using a location-based address to a document for file, IPFS identifies the file using a unique identifier, or hash. This is known as content addressing. Doesn’t make sense?

Here’s a brief example:

Let’s say I have some very important research I’ve undertaken and I want to distribute it to as many people on the web as I can reach.

Using some kind of web-publishing software, I upload this important research in a PDF I’ve called “my-academic-research-that-will-change-the-world.pdf” from my computer to a web site somewhere: https://my.uni.edu

I put it in a folder called “research-papers”.

I then distribute this file to everyone, with the web address https://my.uni.edu/research-papers/my-academic-research-that-will-change-the-world.pdf.

It’s critical that this research is easily accessible, unchangeable, and built to last for generations.

There are a number of problems with this solution:

  • What happens if the web site address changes?
  • What happens if the web site gets moved?
  • What happens if the web site administrator decides to change the research papers path to something else?
  • What happens if this file gets deployed to other web sites?
  • How do I ensure people know that the file my-academic-research-that-will-change-the-world.pdf contains MY research and hasn’t been changed, manipulated or completely replaced?

Basically, the problem we have is that this important research cannot be guaranteed to:

  • be accessible using the same address
  • be unique
  • be tamper-resistant

IPFS Solves These Problems

IPFS solves these problems by generating an identifier or “hash” based on the contents of the file. A hash which is mathematically unique enough to never result in a clash between two files.

By generating a hash based on the contents of my-academic-research-that-will-change-the-world.pdf, I can now locate this file no matter where it’s stored.

Additionally, IPFS includes a network of computers which can store this file so that it can be found from the network rather than from one computer (for example, https://my.uni.edu).

Linked Data – Decentralised

IPLD is another part of this decentralised picture. IPLD can link pieces of data together.

Again, taking our research ‘my-academic-research-that-will-change-the-world.pdf’, I can now add some metadata which I can link to the document.

Metadata might include:

  • the title of the paper
  • authors
  • keywords
  • an abstract from the paper
  • other important descriptive information

Additionally, I can extend the linked data to include whole profiles about the author, link keywords to other data so that I can find similar works. I can even store links to revisions, so a historical change log of the paper could be easily retrieved from the most current version.

And The Benefit Of All Of This?

  1. Everything is uniquely identified based on its content.
  2. It can never be corrupted because the corruption will be easy to identify.
  3. It will be censorship-resistant because any change will generate a whole new set of identifiers.
  4. And it’s easy to locate because the identifier never changes.
CIDs are content identifiers. Metadata is captured as individual documents and can be compared for changes. Items store a reference to assets and metadata. A chain of items provides a version history. Each keyword is also a data block and multiple keywords can be linked to metadata.

Follow The Developments

Follow our progress by checking out our code at GitLab

+Follow Knowledge Arc Network on LinkedIn

Join the Knowledge Arc Network discussion on Telegram

Leave a Reply

Your email address will not be published. Required fields are marked *