Plan For Disruptions: Networking in a Disconnected World

Modern networked applications are generally developed under the assumption of ubiquitous, high availability networks to afford communication between computing devices. This assumption is based on the tenets that all nodes in the network are addressable at all times, and all nodes in the network are contactable at all times. But, what if we consider a network environment where not all present devices are contactable? What can we learn from building software that operates in a low-availability environment?

The word “networking”, can both refer to a set of communicating computing devices (a TCP/IP computer network), and to a set of people meeting to build inter-personal connections (a NYC start-up networking event, for example).

If we frame these two concepts together, we can gain understanding of how to build applications tolerant of low-availability environments.

We should consider the properties of each of these networking concepts. What makes them similar? What makes them different? In the typical office environment, desktop machines are always plugged in to the office backbone network, which is always present and always has access to the internet using the office’s connection. Regular computing networks are designed to afford communication in such an environment. IP/Ethernet addresses of network members are assumed to resolve to a machine, and we assume that machines will not change Ethernet addresses (generally they do not change IP addresses either). All functional machines, therefore, are assumed to be addressable and contactable, at all times.

This is clearly quite a simplistic example. We can, however, contrast it with the human networking example. In an NYC start-up event, we consider the communication network to be human discussion. When the attendees are mingling over drinks, they are moving in and out of audible range of one another. Not everyone present at the event is immediately contactable by every other attendee, despite being in the same room (network), because not all attendees are part of the same group conversation. All attendees are addressable, however, because everyone is wearing their name badge!

I like to think conceptually that these types of networking lie on opposite ends of a spectrum. On one end we have a “solid” networking state, where computers do not change location, routing paths are assumed to be static (or effectively static) and all connected machines are assumed to be addressable and contactable by all other machines on the network.

At the other end of the spectrum we have a sort of “gaseous” network where meet-up attendees are in small, disparate networks.  Members are available to communicate locally, but are aware of all other attendees who, while they cannot be communicated with at present, are addressable and assumed to be contactable at some point in the future (as attendees mingle).[i]

Most of my work in academia focused on routing protocols designed for these low connectivity environments similar to the human tech meet-up. In these types of networks, a node may be aware of the address of the device that it is attempting to contact, but there may be no routing path for communication. In this case, a full path will never be available, and therefore, the node will never successfully communicate with the intended destination. Nodes must therefore pass messages through intermediate nodes that store and forward messages on each other’s behalf.  These types of networks are called opportunistic networks, as nodes may pass messages opportunistically whenever the opportunity to do so arises.[ii]

A good example of an opportunistic network would be an SMS network of everyone in NYC, where messages could only be passed between phones using Bluetooth, with  messages forwarded on each other’s behalf as soon as they came into range. By exploiting six degrees of separation, messages could travel throughout the city, producing an approximation of  a free SMS delivery network, albeit a rather slow one!

Manhattan provides a great location for an opportunistic network due to its high population density and small geographic area.
Manhattan provides a great location for an opportunistic network due to its high population density and small geographic area.

It’s not hard for me to see the link between this and my new job at AetherWorks, where we develop distributed application software. Consider AetherStore, our distributed peer-to-peer storage platform that presents the free disk space of multiple machines as a single address space. Data can be written to any subset of machines, and the data and backups are automatically synced and managed by the nodes themselves. Like most modern software, it is designed to operate in a heterogeneous network environment.

AetherStore uses a decentralized network where nodes may join and leave at any time, so it operates in a problem space at the convergence of well-connected TCP/IP networks and a disconnection-prone environment. Consider a customer using AetherStore to sync files between two desktop machines and a laptop at their office.  They may wish to take their laptop out of the office and work on the stored files in a park.  If someone else in the office decides to modify “the big presentation.ppt” simultaneously with our user in the park, when the two devices sync there will undoubtedly be conflicts.

This synchronization may seem like a trivial problem, but it is it not. Time stamps cannot be trusted. How do you know which machine has the correct time? Furthermore, how do you know how to construct the differences between the files? One way to quickly determine if conflicts are present is to construct a tree of changes to files, similar to the approach of modern Distributed Version Control Software (e.g. Git, Mercurial). These change-trees are then compared when machines can once again communicate. In our example of the laptop taken home, we can immediately draw some parallels to our disconnected network environment. The two nodes in the office and the laptop in the park are part of the same AetherStore network, and yet not all nodes are contactable or addressable by all other nodes.

By building a system that can handle the difficulty of the disconnected environment, a state that may not occur often but must be accounted for, we can necessarily cope with the well-connected, high availability network environment.

I present no answers here. I will, however, leave you with a few questions:

  • Should nodes queue change-trees to be exchanged between devices when they meet? How big should we set this limit? Can we discard old change sets?
  • Should certain machines always defer to others versions of files when conflicts occur?
  • Can we develop a system in which timings of changes can be trusted?
  • Should we take human factors into account? Should we consider employee hierarchy? Is it a good idea to always accept your boss’s changes?
  • Is there a “plasma” state for networks somewhere past “gaseous” on my spectrum? I may not know who is going to turn up (non-addressable), or for how long they will be part of the network (address lifetime).
  • Should we allow users and addresses to be separated?
  • Perhaps we could predict addresses that might be usable in future on such a network?
  • Should we use address ranges instead of unique addresses?
  • Can nodes share addresses for certain points in time?[iii]


[i] Somewhere in between solid and gaseous networks we have ‘liquid’ state Mobile Ad hoc NETworks (MANETs). In a MANET the routing paths between nodes may change frequently but nodes are all are addressable contactable at any given point.

[ii] Note the distinction between opportunistic networks and Delay Tolerant Networks. A DTN may assume some periodicity to connections, which gives rise to a different set of routing algorithms.

[iii] For discussion of phase transitions in opportunistic networks:

Fast Reads in a Replicated Storage System

People don’t like slow systems. Users don’t want to wait to read or write to their files, but they still need guarantees that their data is backed up and available. This is a fundamental trade-off in any distributed system.

For a storage system like AetherStore it is especially true, so we attach particular importance to speed. More formally, we have the following goals:

  1. Perform writes as fast as your local disk will allow.
  2. Perform reads as fast as your local disk will allow.
  3. Ensure there are at least N backups of every file.

With these goals the most obvious solution is to backup all data to every machine so that reads and writes are always local. This, however, effectively limits our storage capacity to the size of the smallest machine (which is unacceptable), so we’ll add another goal.

  1. N can be less than the total number of machines.

Here we have a problem. If we don’t have data stored locally, how can we achieve reads and writes as fast as a local drive? If we store data on every machine how do we allow N to be less than that? With AetherStore, goal 4 is a requirement, so we have to achieve this while attempting to meet goals 1 & 2 as much as possible.

This proves to be relatively trivial for writes, but harder for reads.

Write Fast

When you write to the AetherStore drive your file is first written to the local disk, at which point we indicate successful completion of the write — the operating system returns control to the writing application. This allows for the write speed of the local disk.

AetherStore then asynchronously backs up the file to a number of other machines in the network, providing the redundancy of networked storage.

Files are written locally, then asynchronously backed up to remote machines.

It is, therefore, possible for users to write conflicting updates to files, a topic touched on in a previous post.

Read Fast

Ensuring that reads are fast is much more challenging, because the data you need may not be stored locally. To ensure that this data is available on the local machine as often as possible, we pre-allocate space for AetherStore and use it to aggressively cache data beyond our N copies.

Pre-Allocating Space

When you start AetherStore on your PC you must pre-allocate a certain amount of space for AetherStore to use, both for primary storage and caching. As an example, a 1 GB AetherStore drive on a single workstation machine can be visualized like this when storing 400 MB:

Replicated data takes up a portion of allocated space.

The 400MB of data indicated in orange is so-called replicated data, which counts towards the required N copies of the data we are storing across the whole system. Each of these files is also stored on N-1 other machines, though not necessarily on the same N-1 machines[1].

Caching Data

The remaining 600GB on the local drive is unused, but allocated for AetherStore. We use this space to cache as much data as we can, ensuring that we get the read speed of the local drive as often as possible. Cached data is treated differently than replicated data (as illustrated in our example below), as it must be evicted from the local store when there is insufficient space available for newly incoming replicated data. At this point some read speeds will be more similar to a local networked server.

Cached data takes up space not used by replicated data.

The advantage of this approach is that it allows us to store data in as many places as possible when we have the space, but then store as much data as possible as the drive fills up. In the worst case we’re as good as a networked server, in the best-case we’re as good as your local file system.

Of greater importance, is that we are able to store data across only a subset of machines. This allows the capacity of the AetherStore drive to outgrow the capacity of any individual machine, while ensuring that users of every machine can still access all of the data on the drive.

[1] Replica locations are determined on a file-by-file basis, rather than on an entire set of files, so it is unlikely that large groups of files will be backed up to the same set of machines. This allows us to make use of low capacity machines by replicating fewer files in comparison to larger machines.

‘Where is the Server?’

This is the first post of our new series answering some of your most frequently asked questions. We’ll start with the most common query.

Where is the server?

This is easy to answer: there isn’t one.

Other systems that make use of the spare capacity you have on your servers or workstation machines either require that you have a constant connection to the cloud, or that your machines are connected to a dedicated centralized server.

In AetherStore the software that runs on each of your workstation machines manages and co-ordinates access and backup to data. These machines co-ordinate among each other, so there is no need for a central server.

AetherStore's Serverless Architecture vs Server-Centric Architecture

Challenges and Rewards of P2P Software

I love the challenge of creating peer-to-peer systems and the flexibility that they give us.

A well constructed peer-to-peer system allows us to create applications that work just as well with one hundred machines as they do with two, all without predetermined co-ordination or configuration; applications that don’t rely on a single machine or a specific network topology to run correctly.

With AetherStore, this is precisely what we need. We are creating a software system that eliminates the need for a storage server, instead allowing you to make use of the capacity you already have. If you have ten machines each with 1TB of free storage, AetherStore allows you to combine this capacity to create 10TB [1] of networked, shared storage, without any additional hardware.

With no shared server, we want to avoid making any one machine more important than the others, because we don’t want a single point of failure. We can’t delegate a machine to manage locks for file updates or to determine where data should be stored. Instead we need a system that is able to run without any central co-ordination, and that dynamically up-scales or down-scales as machines start up or fail.

This post discusses one of the ways in which AetherStore achieves this as a peer-to-peer system.

Conflict Resolution

As we have no central server and no guarantee that any one machine will always be active, we have no way of locking out files for update — two users can update the same file at the same time and we have no way of stopping them. Instead we need to resolve the resulting conflict.

Consider the following example. When two users decide to concurrently update the same file, we have a conflict. These updates are gossiped to the other machines in the network [2], which must independently decide how to resolve the conflict and make the same decision regardless of the order in which the updates were received.


This independent evaluation of conflicts is critical to the scalability of the system and to peer-to-peer architectures in general. If each node makes the ‘correct’ decision without having to contact any other nodes, the system is able to scale without introducing any bottlenecks [3]. This is the advantage of the peer-to-peer architecture, but it is also the challenge.

In the case of AetherStore, to deterministically resolve file conflicts we can only use one of the two pieces of information available to us: the time of the file update and the identity of the machine making the update. Time is an imperfect comparison, however, because the system clocks of each machine in the network are unlikely to be synchronized. Using machine ID for comparison is even less suitable because it results in an ordering of updates entirely determined by a user’s choice of machine [4].

Both options are imperfect, but they are the only choices we have without resorting to some form of central co-ordination. Consequently, we use the time of the update — the lesser of two evils — to determine which update takes precedence, with the other, conflicting update being added into a renamed copy of the file. If each update occurred at precisely the same time, we use the machine ID as a tiebreaker [5].

Truly Peer-to-Peer

The advantage of this approach is that every machine is an equal peer to every other machine. The failure of one machine doesn’t disproportionately affect the operation of the system, and we haven’t had to add a special ‘server’ machine to our architecture. Also, because each node resolves updates independently, we can easily scale out the system without fear of overloading a single machine.

Machines can be temporarily disconnected, users can take laptops home, a lab can be shut down at night, and the system remains operational [6].

Contrast this with a more traditional setup, where users are reliant on continued connectivity to a single server to have any chance of access to their data.

The key point here is that the removal of any central co-ordination greatly increases the flexibility of the system and its tolerance of failures. In AetherStore we have a system that is resilient to the failure of individual machines and one that seamlessly scales, allowing you to add or reintegrate machines into your network without configuration or downtime.

There is no central point of failure, no bottleneck, and no server maintenance.

And, for this, I love peer-to-peer systems.


[1] You probably want to keep multiple copies of this data, so the total space available may be slightly less.

[2] Rather than sending updates to all machines immediately, they are sent to random subsets of machines, eventually reaching them all. This allows us to scale.

[3] This is beautifully illustrated in Chord, which can scale to 1000’s of nodes with each node only having to know about a handful of other nodes to participate in the ring.

[4] Tom’s update will always override Harry’s.

[5] This approach is similar to, among other things, the conflict resolution used by CouchDB.

[6] Provided we have provisioned enough copies of user data. This is the topic for another blog post.


Hello and welcome to the AetherWorks blog!

2013 promises to be an exciting year for us, with AetherStore reaching Alpha this week and further releases upcoming.

Over time, we plan to use this blog to discuss some of the more interesting aspects of our work, both technical and operational.

Stay tuned!

The View over Bryant Park, New York.
Our view over Bryant Park, New York.