Overview
The Distributed Storage Protocol allows storing immutable data blobs in the network.
Before storing, blobs are first chunked, then the chunks encrypted, and finally organized in a Merkle tree.
Data chunks are leaves of the tree, while non-leaf nodes contain pointers to nodes one level down, with their Chunk ID and content decryption keys.
A Chunk ID is the content hash of a chunk, while the Blob ID is the Chunk ID of the root of the Merkle tree.
Chunk encryption is done using convergent encryption with a convergence secret in addition to the plaintext content hash as encryption key.
In combination with the chunking mechanism, it enables deduplicated encrypted storage.
Since convergent encryption uses the hash of the content as encryption key, without an additional secret it is vulnerable to confirmation attacks, i.e. anyone who knows the plaintext may confirm that a node stores a given piece of data.
Hence it’s important to share the covergence secret only within the appropriate scope, e.g. globally for public data, or per domain, topic, or even per blob when no deduplication is needed.
The Blob ID allows traversing the Merkle tree nodes that can be used to verify that the blob is stored correctly, but does not allow decryption of content.
This verification capability enables verifying that storage nodes keep their commitments.
The storage verification protocol may use random sampling and hash-based challenge-response verification for efficiency.
Storage Commitments
A Storage Commitment allows a node to commit to storing the referenced blob for a certain amount of time. It contains:
- Blob ID that is stored.
- Node ID that stores the blob.
- Expiration, either:
- Absolute time until the blob is stored.
- Epoch number in a given pub/sub topic, until which the blob is stored.
- Request restrictions: additional requirements for nodes who request the blob, such as:
- Membership of a given domain
- Signature by private key corresponding to the Node ID.
Storage commitments are typically sent to pub/sub topics, and may also be stored in the Name System, which allows assigning mutable named references to a commitment, and thus to the blob it references.
Each node stores storage commitments it knows about,
along with the source it arrived from,
e.g. the pub/sub topic ID or the zone and label in the name system,
which may be used to look for other commitments after the commitment expired.
Lookups & requests
Lookup
When looking up a blob, a node first checks whether it already has a local copy.
If not, it consults its local index of storage commitments, and performs remote lookup.
To do so, the node sends a blob request to nodes for which a valid storage commitment is known locally.
If none returns a result, the node may try to send requests to nodes with expired commitments, in case they still store the blob or know where they are stored.
Finally, the node may try to find additional storage commitments from the sources the already known commitments arrived, or where they can be presumably found, by synchronizng pub/sub topic messages and performing name system lookups.
Request handling
When a node receives a blob request, it may answer either
with the content of the blob, if available locally and the node wishes to provide this service, or with storage commitments for the requested blob, if known.
Read permissions specified in the storage commitment
may restrict to whom a blob may be served
by setting additional requirements for the requestor,
such as membership of a given domain.