Over recent years we have seen the growth of three seemingly orthogonal topics: green-computing, small tech businesses, and data analytics. This has led to the rise of simple local file storage and sharing applications that provide easy access to critical business data, while minimizing cost.
As businesses grow in size, they employ more people, purchase more workstations, and see an overall increase in demand for storage capacity.
These businesses can either provision a server, which is subject to sawtooth-like changes in utilization as demand increases, or they can use a cloud or local storage synchronization application.
Most of these local storage applications require data to be stored on all machines running the storage application. This conflates the issue of backups (multiple copies required for redundancy) and availability (data is available from all machines). If all machines have to store a copy of the data, then when a new machine joins it must have enough space to store all of the existing data, which means that machines with small hard disks may not be able to join and access all data. When you add a new machine to your system it doesn’t increase your capacity at all, even though your demand is increasing.
What if it were possible to have copies of data available on all machines, while not requiring all machines to keep a copy? This would allow us to support machines with low capacities as well as high capacities, and would ensure that our storage capacity increases as the number of machines available increases.
With AetherStore, we can do this.
Rather than storing all files on every machine in the local file system, AetherStore creates a virtual network drive, which allows all machines to access all data without having to have it all on any single machine. When a user saves a file to the AetherStore drive, the file is backed up across a number of machines on the local network. AetherStore ensures that there are multiple copies of the data for each file available, but it stores these copies on a subset of all machines rather than across every one.
AetherStore abstracts the physical location of file data so all machines in the local AetherStore network can see the files without necessarily having to store a local copy. AetherStore’s storage scales linearly with the number of machines in the network, so more machines means more storage capacity. Your storage capacity grows with the size of your business!
AetherStore splits files into chunks and store these chunks across the members of the local AetherStore network. The current default number of machines to replicate data onto is four, though this is customizable. The disparity between the visibility of files on all machines and the locality of files split up across a subset of all machines is what allows AetherStore to scale efficiently, so what does this mean for your available storage capacity?
The following graph shows “the storage capacity vs. number of machines” and compares AetherStore to “Full-Copy” systems that require all machines to store file replicas.
Taking Advantage of All Capacity
Each machine in an AetherStore network can have a different sized hard disk and a different amount of capacity allocated for use by AetherStore. For example, consider six machines running AetherStore on a local network. For the sake of our example, each node is able to store some number of units of data. We can visualize our example AetherStore network:
Let’s now save a small file to AetherStore. In this example we assume each file saved takes up a whole unit of storage. In our example network, there are 23 units worth of storage across all machines in the system. We know that we have to have several copies of data, so if we save a single unit file to AetherStore, we have four replicas saved across the system.
Figure 4: After saving a one unit file, we now see that the whole takes up four units across the system. Note that the algorithm does not allocate more than a single replica per machine for redundancy’s sake.
We should remember that because files are split into chunks which are replicated four times, we can therefore view each chunk as “costing” (in terms of space required) four times its actual size.
So how much data can our system store? Let’s save two more single-unit files to AetherStore:
In a system requiring a full copy of all data, the machine with the smallest capacity would be unable to access every file!
Now that we have seen the advantage of this form of data allocation, let’s make the leap to using real units. If a one unit file requires four units of storage in an AetherStore network, a 1MB file “costs” 4MB to store.
If this file requires a copy on every machine in a full-copy system, a 1MB file would cost 6MB in the above example. Moreover, once the capacity is full on a machine, that machine can no longer view all of the files. The full replica system uses more of your space and then limits access once a machine’s capacity is reached!
The short equation for the capacity of your AetherStore is:
Local store capacity = Space allocated / 4
The same equation applies to all nodes in the network. Thus the capacity of the total AetherStore system is:
Total AetherStore system capacity = Sum of allocated space across all nodes / 4
Remember that behind the scenes AetherStore breaks files into chunks, so our example was both a visual metaphor AND a model of AetherStore itself!
In practice, AetherStore is more complicated than this because we utilize caching on all nodes so that repeated access of a file only makes local requests, reducing network traffic and increasing speed (discussed in a previous post).