How to Create an Encrypted, Redundant Drive in Minutes

At this point, our businesses and even our personal lives would be nearly impossible to run without our data. Not backing it up properly or trusting the wrong service to do it for us is no longer an option. Luckily, features that used to come with enterprise-grade prices and complexity are now available to anyone with a computer. Here’s how to create your own encrypted, redundant, chunked, password-protected drive with AetherStore in minutes.

To Get Started: Install AetherStore on your computers with spare storage. You’ll notice an optional AetherStore Bridge component in the installer, you only need to include this component on machines you’ll use to set up and manage AetherStore.

1. Launch AetherStore Bridge to Start Creating Your Store

Open the AetherStore Bridge and select “Create Store”. Choose a name for your Store.

1

2. Select Store Size

Choose the size of your Store. The percentages below each option indicate the percentage of free space on each machine that will be allocated to AetherStore. By default, AetherStore will replicate your data four times. 

2

  • For added customization: choose “Use Custom Create”. From there you can view each one of your machines, include or exclude them from the Store, and set exactly how much space you want each of them to contribute. You can also change the replication factor in Custom Create.

3. Select Mount Machine

Pick the mount machine for your drive: this is the computer that will be able to view and access the Store. You can change the mount machine after your Store is deployed, or at any time, from the Manage Stores page. Assign any drive letter not currently in use.

3

4. Create Store

Click “Create Store”. You’ll be promoted to set a password for your Store. Once entered, hit “Deploy”, and that’s it! You’ve created an encrypted, redundant, password-protected Store drive.

4

Check out your new Store on the mount machine, and use it just like you would any other drive.Mounted drive pic

Sign up for access to AetherStore 2.0 here:

The Storage Resource You’re Not Using

When was the last time you thought of a computer workstation as a storage resource?

Workstations typically ship with a minimum of 500 GB storage, yet usage information from AetherStore users proves that, until now, data was stored everywhere except on workstations. It’s not necessarily surprising, as storage trapped on individual drives provides little value; but how much storage is accumulating as a result?

We gathered data from 520 machines running AetherStore, including both workstations and servers, to see just how much office storage was underutilized:

Average % Available Space per Machine: 73%

Average GB Space Per Machine: 352 GB 

Computer w Space

Multiplied by the number of machines in your office, you can imagine how quickly this available storage adds up. In fact, on the 520 AetherStore machines in this data set, there were over 180 terabytes of unused storage space! Imagine how much 180+ terabytes of onsite storage could cost if you had to purchase it outright.

The data makes it apparent just how much storage offices already have when provided the technology to combine and manage it effectively. AetherStore customers in this data set included anywhere between 4 and 65 machines in their deployments, and reclaiming storage resources was surprisingly simple. In fact, the speediest of our users were able to get AetherStore up and running in under eight minutes, creating a multi-terabyte drive in less time than it takes to boil an egg.

Average Number Machines per Store: 9

Average Space Available per Deployment: 3.2 TB

creating store

No matter what size your office is, if you have a few minutes and some workstations you have everything you need to start rapidly increasing usable storage capacity. You’ve already paid for the hardware – now you can finally use your space!

Get in touch with AetherStore and find out how much storage is waiting for you in your own network!

Backup for Disaster Recovery

“Backup storage that just works is my primary goal for getting AetherStore into production.”

– Brant Wells, Wellston Technology

Wellston Technology will deploy AetherStore as a central part of their Backup and Disaster Recovery strategy at a client site with 350 machines, producing storage for redundant backup. Brant Wells, Owner & Lead Technologist, leads the AetherStore implementation.

Existing Pain Points:

“In the past, I have dealt with problems where the backup and storage was consistently requiring maintenance, and unreliable in general.” Often, important data was shipped offsite to remote locations, leaving Brant unsure what was actually backed up and how. His biggest pain points were:

  • Uncertainty and lack of visibility into status of backups
  • Constant maintenance required for existing backups
AetherStore Deployment:

Brant’s first AetherStore installation was a test environment across four nodes, creating a 30GB Store for smaller backup and CD/DVD images. They also have a small set of random software packages available on the Store. Going forward, Brant will expand the size of his Store across more of the 350 client machines to host a much larger backup, and add additional software packages to his Store.

  • Ease of Management: “I was able to install and manage my first Store within minutes after getting the installations done.”
  • Reliability of Storage: “It is nice not to have to worry about losing a node or disk and have it ruin your night’s backups. With AetherStore, we are able to sleep at night knowing we have a good backup storage.”

“The setup process was simple: install the storage package on any number of nodes, and install the Dashboard on one of them. Mount the drive and share it via Windows. I love that I can also push and manage the installations with apps like PDQ Deploy. No muss, no fuss. It just worked!”

Simple Setup

“With AetherStore, we are able to have a reliable storage location for our backups that makes it easy to house backup images and restore from backup when necessary. We don’t have to worry about whether or not the backup will belly-up if we lose a single (or more than one!) hard drive.”

What We Do

If you’re looking at one of our job ads, chances are you want to know more about what we do — what does a software engineering job at AetherWorks actually involve?

We’re currently building AetherStore, a distributed data store. AetherStore runs over the workstations and servers in your organization, harnessing their unused capacity to create a shared, virtual file system. To a user, we appear as any other networked file system, without requiring any extra hardware.

It’s a wonderfully engaging project to work on because we see such a diverse range of activities every day. From the low level handling of calls from Windows Explorer[1], to the task of breaking up[2], encrypting[3], and dispersing files across the network[4], I’ve covered most of my Computer Science education in some form while working on AetherStore.

Why I Work Here

My own background is in distributed systems (primarily distributed database systems), so the de-centralized and fault-tolerant design of AetherStore has obvious appeal. I love the challenge of creating these systems because you are so constrained by the fundamental properties of distribution[5], and yet it’s still possible to create systems that scale to many hundreds of thousands of machines[6].

With AetherStore our challenge is in creating a shared file system where every piece of data doesn’t have to reside on every machine. We want to spread user data across machines proportional to their size, but we don’t have a centralized authority to lean on when deciding how to do this. And we don’t even know how many machines should be in the system[7]. It’s brilliantly limiting[8]!

For me, it’s hard to imagine a better job. I love the intellectual challenge of creating a new feature or component, and the satisfaction of being able to craft this into a complete, stable software product. It’s  truly an exciting thing to be a part of.

Photo of Angus' Desk
The working environment isn’t bad either!

So, while it’s not easy competing with so many other great New York companies, I think we’ve got a lot to offer. Consider applying!

Our Interviews

My biggest problem with jobs listings (ours included) is that we specify a set of requirements that invariably turn into clichés, and we don’t explain why we need them or how we test for them. So let’s look at a few, and see why they actually matter more than you might think.

“Software Engineer.”

The job title may seem meaningless, but I love this distinction between software engineers and programmers. We want to know that you craft code to a high standard, and that you understand why ‘it just works’ isn’t enough.

In an interview we’ll ask you to review some (bad) code we’ve written, to gauge your code literacy. We’re looking for someone that has an appreciation for clean code and a quick eye for bugs.

“A solid understanding of object-oriented programming.”

We’re building a complex system and we need to make sure that you’re the type of person that can structure code in a logical and maintainable way.  We’ll ask you to do a short programming assignment to get a feel for your general abilities and experience.

“Fundamental Computer Science Background.”

The work I have described in the previous section is challenging, and it requires that you know the relative efficiency of, say, a linked list and an array, but also that you’re capable of creating your own data structures from time to time. For us, the best indicator of this skill-set is an undergraduate degree in Computer Science. In an interview we’ll ask you an algorithmic question that gives you a chance to demonstrate the breadth of your knowledge.

If you do well enough in these questions then we’ll invite you in for a longer interview, asking you to solve a real problem that we’re actually working on in the office.

To Apply

If the idea of working at AetherWorks appeals to you, I’d urge you to check out our available positions. Alternatively, if you have any questions about this post or our interviews, please feel free to email me (first initial, last name[9]).

 


[1] And I mean every single call. Every time you open a directory, right-click on a file, or save a document, Windows Explorer is providing us with a constant stream of calls asking for information and telling us what to update. Since we’re pretending to be a network mount, we have to handle each of these calls, giving responses either from the local machine or a remote copy. This fascinates me more than it probably should, but it gives you some brief insight into the complexity of the operating systems we use every day without thought.

[2] When you store a file we break it up into chunks, both to make it easier to spread data across the network and to increase de-duplication. There are entire classes of research dedicated to finding ways of doing this efficiently. Content-based chunking, in particular, has some really clever uses for hashing (fingerprinting) algorithms and sliding windows, which dramatically improve de-duplication.

[3] We have to encrypt data at rest and in transit, but this is more challenging than in most systems where you have a central authoritative server. Without this, our encryption architecture represents a trade-off between security and usability.

[4] Deciding where to place data is particularly challenging, since we don’t have a central coordinator that can make this decision. All machines must be in agreement as to where data is placed (so that it can be accessed), but it is expensive to allow them to co-ordinate to make this happen.

[6] Constraints can be catalysts for creativity.

[7] Since we don’t know how many machines are ever in the system, we can’t use distributed consensus protocols such as Paxos. These require that a majority of nodes agree on a decision, but if you don’t know how many nodes exist, you don’t know how many nodes form a majority.

[8] The CAP theorem is my favorite (trivial) example of this. Imagine you have 3 machines, and have a copy of some data on each machine. How do you handle an update to that data?

How we handle this update (and anything else in distributed systems) is determined by our response to network partitions – when one set of machines is unable to contact another. If we use a central lock manager to stop conflicting updates we ensure that the data is consistent, but that it will be unavailable if the lock manager cannot be contacted. If we use a majority consensus protocol, we can update our data in the event of a partition, but only if we are in the partition with a majority of nodes. If we assume that neither of these cases is acceptable, we can do away with consistency altogether, allowing updates to each individual copy even when the others are inaccessible. The fundamental properties of a distributed system limit us in each of these options — it’s up to us to decide which is the most appropriate in any given case.

[9] This is our interview puzzle question!

Challenges and Rewards of P2P Software

I love the challenge of creating peer-to-peer systems and the flexibility that they give us.

A well constructed peer-to-peer system allows us to create applications that work just as well with one hundred machines as they do with two, all without predetermined co-ordination or configuration; applications that don’t rely on a single machine or a specific network topology to run correctly.

With AetherStore, this is precisely what we need. We are creating a software system that eliminates the need for a storage server, instead allowing you to make use of the capacity you already have. If you have ten machines each with 1TB of free storage, AetherStore allows you to combine this capacity to create 10TB [1] of networked, shared storage, without any additional hardware.

With no shared server, we want to avoid making any one machine more important than the others, because we don’t want a single point of failure. We can’t delegate a machine to manage locks for file updates or to determine where data should be stored. Instead we need a system that is able to run without any central co-ordination, and that dynamically up-scales or down-scales as machines start up or fail.

This post discusses one of the ways in which AetherStore achieves this as a peer-to-peer system.

Conflict Resolution

As we have no central server and no guarantee that any one machine will always be active, we have no way of locking out files for update — two users can update the same file at the same time and we have no way of stopping them. Instead we need to resolve the resulting conflict.

Consider the following example. When two users decide to concurrently update the same file, we have a conflict. These updates are gossiped to the other machines in the network [2], which must independently decide how to resolve the conflict and make the same decision regardless of the order in which the updates were received.

conflict

This independent evaluation of conflicts is critical to the scalability of the system and to peer-to-peer architectures in general. If each node makes the ‘correct’ decision without having to contact any other nodes, the system is able to scale without introducing any bottlenecks [3]. This is the advantage of the peer-to-peer architecture, but it is also the challenge.

In the case of AetherStore, to deterministically resolve file conflicts we can only use one of the two pieces of information available to us: the time of the file update and the identity of the machine making the update. Time is an imperfect comparison, however, because the system clocks of each machine in the network are unlikely to be synchronized. Using machine ID for comparison is even less suitable because it results in an ordering of updates entirely determined by a user’s choice of machine [4].

Both options are imperfect, but they are the only choices we have without resorting to some form of central co-ordination. Consequently, we use the time of the update — the lesser of two evils — to determine which update takes precedence, with the other, conflicting update being added into a renamed copy of the file. If each update occurred at precisely the same time, we use the machine ID as a tiebreaker [5].

Truly Peer-to-Peer

The advantage of this approach is that every machine is an equal peer to every other machine. The failure of one machine doesn’t disproportionately affect the operation of the system, and we haven’t had to add a special ‘server’ machine to our architecture. Also, because each node resolves updates independently, we can easily scale out the system without fear of overloading a single machine.

Machines can be temporarily disconnected, users can take laptops home, a lab can be shut down at night, and the system remains operational [6].

Contrast this with a more traditional setup, where users are reliant on continued connectivity to a single server to have any chance of access to their data.

The key point here is that the removal of any central co-ordination greatly increases the flexibility of the system and its tolerance of failures. In AetherStore we have a system that is resilient to the failure of individual machines and one that seamlessly scales, allowing you to add or reintegrate machines into your network without configuration or downtime.

There is no central point of failure, no bottleneck, and no server maintenance.

And, for this, I love peer-to-peer systems.

 


[1] You probably want to keep multiple copies of this data, so the total space available may be slightly less.

[2] Rather than sending updates to all machines immediately, they are sent to random subsets of machines, eventually reaching them all. This allows us to scale.

[3] This is beautifully illustrated in Chord, which can scale to 1000’s of nodes with each node only having to know about a handful of other nodes to participate in the ring.

[4] Tom’s update will always override Harry’s.

[5] This approach is similar to, among other things, the conflict resolution used by CouchDB.

[6] Provided we have provisioned enough copies of user data. This is the topic for another blog post.