If you’re looking at one of our job ads, chances are you want to know more about what we do — what does a software engineering job at AetherWorks actually involve?
We’re currently building AetherStore, a distributed data store. AetherStore runs over the workstations and servers in your organization, harnessing their unused capacity to create a shared, virtual file system. To a user, we appear as any other networked file system, without requiring any extra hardware.
It’s a wonderfully engaging project to work on because we see such a diverse range of activities every day. From the low level handling of calls from Windows Explorer, to the task of breaking up, encrypting, and dispersing files across the network, I’ve covered most of my Computer Science education in some form while working on AetherStore.
Why I Work Here
My own background is in distributed systems (primarily distributed database systems), so the de-centralized and fault-tolerant design of AetherStore has obvious appeal. I love the challenge of creating these systems because you are so constrained by the fundamental properties of distribution, and yet it’s still possible to create systems that scale to many hundreds of thousands of machines.
With AetherStore our challenge is in creating a shared file system where every piece of data doesn’t have to reside on every machine. We want to spread user data across machines proportional to their size, but we don’t have a centralized authority to lean on when deciding how to do this. And we don’t even know how many machines should be in the system. It’s brilliantly limiting!
For me, it’s hard to imagine a better job. I love the intellectual challenge of creating a new feature or component, and the satisfaction of being able to craft this into a complete, stable software product. It’s truly an exciting thing to be a part of.
My biggest problem with jobs listings (ours included) is that we specify a set of requirements that invariably turn into clichés, and we don’t explain why we need them or how we test for them. So let’s look at a few, and see why they actually matter more than you might think.
The job title may seem meaningless, but I love this distinction between software engineers and programmers. We want to know that you craft code to a high standard, and that you understand why ‘it just works’ isn’t enough.
In an interview we’ll ask you to review some (bad) code we’ve written, to gauge your code literacy. We’re looking for someone that has an appreciation for clean code and a quick eye for bugs.
“A solid understanding of object-oriented programming.”
We’re building a complex system and we need to make sure that you’re the type of person that can structure code in a logical and maintainable way. We’ll ask you to do a short programming assignment to get a feel for your general abilities and experience.
“Fundamental Computer Science Background.”
The work I have described in the previous section is challenging, and it requires that you know the relative efficiency of, say, a linked list and an array, but also that you’re capable of creating your own data structures from time to time. For us, the best indicator of this skill-set is an undergraduate degree in Computer Science. In an interview we’ll ask you an algorithmic question that gives you a chance to demonstrate the breadth of your knowledge.
If you do well enough in these questions then we’ll invite you in for a longer interview, asking you to solve a real problem that we’re actually working on in the office.
If the idea of working at AetherWorks appeals to you, I’d urge you to check out our available positions. Alternatively, if you have any questions about this post or our interviews, please feel free to email me (first initial, last name).
 And I mean every single call. Every time you open a directory, right-click on a file, or save a document, Windows Explorer is providing us with a constant stream of calls asking for information and telling us what to update. Since we’re pretending to be a network mount, we have to handle each of these calls, giving responses either from the local machine or a remote copy. This fascinates me more than it probably should, but it gives you some brief insight into the complexity of the operating systems we use every day without thought.
 When you store a file we break it up into chunks, both to make it easier to spread data across the network and to increase de-duplication. There are entire classes of research dedicated to finding ways of doing this efficiently. Content-based chunking, in particular, has some really clever uses for hashing (fingerprinting) algorithms and sliding windows, which dramatically improve de-duplication.
 We have to encrypt data at rest and in transit, but this is more challenging than in most systems where you have a central authoritative server. Without this, our encryption architecture represents a trade-off between security and usability.
 Deciding where to place data is particularly challenging, since we don’t have a central coordinator that can make this decision. All machines must be in agreement as to where data is placed (so that it can be accessed), but it is expensive to allow them to co-ordinate to make this happen.
 Since we don’t know how many machines are ever in the system, we can’t use distributed consensus protocols such as Paxos. These require that a majority of nodes agree on a decision, but if you don’t know how many nodes exist, you don’t know how many nodes form a majority.
How we handle this update (and anything else in distributed systems) is determined by our response to network partitions – when one set of machines is unable to contact another. If we use a central lock manager to stop conflicting updates we ensure that the data is consistent, but that it will be unavailable if the lock manager cannot be contacted. If we use a majority consensus protocol, we can update our data in the event of a partition, but only if we are in the partition with a majority of nodes. If we assume that neither of these cases is acceptable, we can do away with consistency altogether, allowing updates to each individual copy even when the others are inaccessible. The fundamental properties of a distributed system limit us in each of these options — it’s up to us to decide which is the most appropriate in any given case.