St Andrews Programming Competition Winners

As we recently announced on the blog, AetherStore sponsored the St Andrews Programming Competition 2014 on April 7th at the University of St Andrews. Seventy-five participants including undergraduates, postgraduates, and even staff members competed to solve a series of coding problems in three hours, using a programming language of their choosing. £200 cash prizes were up for grabs in the subhonours, honours, postgraduate, and individual student categories. Congratulations to all the winners!

Overall Champion: 

Head of School: Steve Linton

Postgraduate Champion: 

Team: SKI Instructors (Matus Tejiscak, Christopher Swaab, Adam Barwell)

Postgraduate Runner Up: 

Team: Kmp (Daniela Grijincu, Mihai Pitu, Radu Floroiu)

Honours Champion: 

Team: TwoGingersAndAnAsian (Alex Field, Luke Borwell, Ivan King)

Sub-honours Champion: 

Team: Missionary (Kestutis Vilcinskas, Austeja Elvina)

Sub-honours Runner Up: 

Team: Stack Overflow (Tom Dalton, Emily Dick, Neil Wells)

Best Individual Student: 

Nathan Blades

Here are some shots from the competition:

Congratulations to all of the participants, and thanks to the St Andrews computer science department for putting together a great competition!

Brainstorming Blog Ideas: Why you probably already have the material you need

“Start blogging.”

It’s probably the number one piece of advice given to marketers looking to up engagement online. If you put out the right content, the right people will find you. Yet companies (us included!) don’t blog as often or as effectively as we’d like because putting out the “right content” is so much easier said than done. We know first-hand it takes time and serious brainpower to identify the topics where your expertise and your audience’s interests intersect.

As a brainstorming tool, I’m breaking down three of the categories that successful blog posts seem to fall into – based on my own experience and other blogs I’ve read. You’re already familiar with the categories, but I’ve included some questions and examples that may be of help if you’re ever asking (like we often are) “What should we write about this week?”

1. The How-To:

There’s nothing like Googling “How to…” to realize that whatever your problem is, you’re not the first person to look for an answer to it. Tutorial posts are a no-brainer when it comes to creating content that’s valuable to your audience. We’re a software R&D firm, so some of our most successful posts have been written by our developers detailing solutions they’ve found to particularly challenging or interesting programming problems.

Tutorials don’t need to be advanced. I recently needed to make some changes to our AetherStore brochure in Adobe Illustrator, a few of which required Illustrator skills that weren’t in my wheelhouse. I posed my “How-to” questions to the web and found tutorials that gave me the exact tidbits of advice I was looking for. In this case, I didn’t need an expert’s overview of the software, I needed specific instructions that were actionable for someone of any skill level.

2. Reviews & Comparisons

If a blog’s purpose is to unite your insight with your reader’s interest, a knowledgeable review is a great way to get there. From which software to buy to which conferences to attend, peer opinion carries a lot of weight, specifically when your organization doesn’t stand to benefit from the review. There are few decisions made today that aren’t pre-researched online.

Blog reviews can be even more helpful when they compare two things directly. At various moments we’ve found ourselves deciding between Jira and YouTrack for task tracking, Optimizely and Unbounce for A/B Testing, even Paychex vs. ADP to handle our payroll. We almost always consult blogs that compare and contrast as part of the research we do make an informed decision – as does the majority of the web. If you’re experienced with a product or service or well-versed in how two different ones stack up – someone may be searching for your opinion.

3. New Ideas

One of our most popular posts was written by one of our engineers, called “The Waiting Game: Fast-Food Queuing Theory.” This post applied our specific skillset, computer science, to a very common problem, long lunch lines. Programmers could study the code, and everyone could appreciate the proposed solution.

If you’ve been musing on a solution to an everyday problem or have ideas on a new way of doing something, why not propose your theories to your blog audience? Whether or not your solution holds up, it could be a great conversation starter.

Per those categories, here’s a list of questions to use as a jumping-off point when trying to brainstorm blog ideas:

  • What problems have you solved recently?
  • Have you learned a new skill, shortcut or technique? (Think beyond tech, too. Did you restructure a meeting format to make it more productive? Send a thank-you email that received a great response? Come up with some great interview questions?)
  • Did you read another tutorial that didn’t answer your question or find one on which you could expand?
  • Have you started using any new software or hardware recently?
  • Have you attended any conferences or events that you could review?
  • Have you switched products or services recently?
  • What daily annoyance drives you crazy? How would you propose to fix it?

We’re always striving to improve the quality of the AetherWorks Blog and reach new audiences. Not every post resonates, but it’s worth keeping a frequently-updated blog to help hone in on what does interest readers. If you put out the right content the right people will find you. And if you put out enough content, you have a better chance of putting out the right content.

St Andrews Programming Competition

We’re excited to have AetherStore sponsor the upcoming St Andrews Programming Competition 2014! The AetherWorks team has plenty of St Andrews graduates. Our six alumni have a collective ten computer science degrees, nine of which came from St Andrews, so they’ve logged a lot of hours in the department and look forward to being part of the event.

The programming competition is open to students of all levels and with any amount of programming experience. Teams of three will have three hours to solve a set of programming problems and win £200 worth of prizes. Here’s what the St Andrews School of Computer Science Blog had to say about the competition:

“Generally, programming competitions are aimed at the best programmers, this is a first-of-its-kind competition where students from all levels with any amount of programming experience stand a chance to win a prize. Another unique aspect of this competition is that it has also open to members of staff from the School of Computer Science, making this a fun experience and a bonding opportunity for staff and students.”

The competition will be held Monday, April 7th from 2pm-5pm GMT, and AetherStore will also be providing refreshments. Click here for more details and registration information. Best of luck to all of the participants!

st a compsci
A photo from back in the day: Angus and Greg in the computer science lab in 2010

Meet the IT

It’d be difficult to develop software that makes life easier for IT Pros without understanding what they’re dealing with, so we appreciate every insight we can get into what a day at the office looks like for an IT Administrator (as it turns out, no two days at the office really look the same). Throughout AetherStore‘s development we’ve had the opportunity to speak with some awesome IT Pros, and one of our Spiceworks beta testers agreed to let us publish some info on what makes him tick so everyone can share his insight.

Meet Shuey, a passionate IT Pro with a unique talent!

Name: John Schuepbach (“Shuey”)

Role: Network/Systems Administrator

John Schuepbach

Can you briefly describe your role as a Network/Systems Administrator?

“I support a staff of approximately 250 users, and my IT team consisting of 6 people. Pretty much anything that has to do with IT (hardware, software, networking, printers, servers, etc), I support it. Heck, some staff even think that ‘IT’ also means ‘building maintenance’, ‘janitor’ and ‘free home IT support’ LOL.”

What’s the breakdown of a typical day for you, in terms of tasks or areas of focus?

“What’s really cool about my job is that no day is ‘typical’. On my ‘best’ days, I may only get one phone call during the entire 8 hours I’m on the clock. And those are the days that I dig into projects and keep charging forward to get as much done as I can, because… On my ‘worst’ days, I’m so busy putting out fires that I’ll have one person on-hold, one person I’m talking to, and another person calling in! And it’s next to impossible to dive into anything on days like that.”

What’s your biggest pain point, what makes your job most difficult?

“Politics!! Whether it’s users who think that THEIR problem is the ONLY thing I have on my plate, or finding a way to help upper management understand what’s REALLY going on in the IT department, or conveying the importance of the need to spend money in order to maintain and effectively grow IT infrastructure, it always seems like political red tape is the biggest hurdle.”

What’s the most helpful tool in your IT arsenal?

“Having a strong ability to remotely troubleshoot and fix problems. This might seem like a simple answer, but I can’t tell you the number of times I’ve seen IT staff who either 1. Don’t have a strong enough foundation to know how to remotely support staff, or 2. Don’t think to use it. When a user has a problem, and their success depends on how quickly you can troubleshoot and fix the problem, leaving your ‘battle station’ to physically deal with the issue often takes extra time that ends up turning into wasted time. Plus, it never seems to fail that when you leave your desk to go take care of something, that’s when another user calls in and needs help; but now you don’t have access to all your ‘IT Tools’ because you’ve left your battle station!”

What’s your favorite part of your role?

“Things I really enjoy working on are big projects that take several hours over the course of a few days, to as many as hundreds of hours over the course of a couple of months. Especially things that involve revamping something to make it a lot better, or building something from the ground up (cleaning up an existing group policy implementation, setting up a WSUS server, creating scripts to simplify tasks, and documenting procedures and policies; to name a few).”

Hidden Talent:

Shuey is world-famous for his Tetris creations (check them out here)! His videos have been featured on TV shows in Japan, Australia and the UK, as well as on popular websites like joystiq.com, kotaku.com and geekologie.com. Here’s an interview he did for HardDrop.

He has also been a self-proclaimed Hardcore Video Gamer since the age of five, and before selling off a large portion amassed an impressive game collection during his 36-year run.

Shuey’s “How-tos” and More:

Creating a Java REST API with Jersey (Including Code Example)

In our latest development cycle we’ve been working on creating an official API to manage and control AetherStore. As part of this process I’ve been experimenting with Jersey 2, the Java reference implementation for JAX-RS, the Java API for RESTful Web Services.

This post discusses an example API using Jersey. The example is itself fairly well commented, explaining why certain pieces of code are needed, and how they relate to the project. This post covers the structure of the project and discusses some of its more interesting features.

The example is of an API representing a set, which allows:

  • Strings to be added as part of a GET request parameter
  • Strings to be added as part of a PUT request body.
  • The set of all strings stored to be returned.

If you’re using Jersey, it’s important to note that a lot of examples online use Jersey v1, which causes problems because this version of Jersey uses an entirely different namespace — v1 uses com.sun.jersey, whereas v2 uses org.glassfish.jersey, and a number of classes are either named differently, or are in different sub-packages.

The code on GitHub should work straight out of the box if you’re using Eclipse for Java EE.

Areas Covered

The code is useful if you’re interested in one of the following features in relation to jersey:

  • Running in eclipse.
  • Setting up appropriate Jersey Maven dependencies.
  • Setting up your web.xml to work with Jersey.
  • Creating a basic API.
  • Using JSON to wrap requests and responses.
  • Using an exception mapper for more readable error handling.
  • Injecting dependencies into resource classes (the API classes).
  • Unit testing Jersey.
  • Mocking calls used by our Jersey API.
  • Unmarshalling responses from a Jersey API.

Reading the Code

This section describes how the code is structured at a high level. There are some comments on specific lines of code, but I’d recommend looking at the comments in the code for a closer look at individual features.

Core

At the core of a Jersey application is the pom.xml file, which specifies all of the dependencies in the code — including Jersey itself — and the versions being used. If you’re using another example, it’s important to note the version of Jersey you’re using. Here we are using Jersey 2.6:

<dependency>
	<groupId>org.glassfish.jersey.containers</groupId>
	<artifactId>jersey-container-servlet</artifactId>
	<version>2.6</version>
</dependency>

The web.xml file specifies how your servlet is named, and where the main Jersey Application class is. This Application class is used to start your Jersey servlet.

<servlet>
	<servlet-name>jersey2-example</servlet-name>
	<servlet-class>org.glassfish.jersey.servlet.ServletContainer</servlet-class>

	<init-param>
		<param-name>javax.ws.rs.Application</param-name>
		<param-value>com.aetherworks.example.jersey2.SetApplication</param-value>
	</init-param>
	<load-on-startup>1</load-on-startup>
</servlet>

In this example our application class is called SetApplication. It registers bindings for a few classes, which we’ll discuss later. At this point the most relevant call to our application is:

packages(true, "com.aetherworks.example.jersey2.api");

This tells the servlet container where to look for the resource classes that form the API. In this case our resource class is called SetResource.

SetResource (API) Class

To recap, this project implements an API which supports two calls to add a string to a set (/set/add/<value> and /set/add), and a single call to get all entries in the set (/set/get). The class definition for this class annotated (shown below), to set the base path of the API call to be /set:

@Path("/set")
public class SetResource {

Then the methods in this class are further annotated to describe the API calls under this path. For example, the following code is executed when a /set/add/{value} request is made.

@GET
@Path("/add/{value}")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public Response addSingle(@Context final UriInfo headers, @PathParam("value") final String value) throws InvalidRequestException {
	LOGGER.log(Level.INFO, "Call to " + headers.getPath());
	final boolean added = callHandler.add(value);
	return Response.status(Response.Status.OK).entity(added).build();
}

The full call to this method will be /set/add/{value}, where value is a variable that is mapped to the value parameter of the method as a result of the @PathParam annotation. We want the call to consume and produce a JSON response, so we specify this in the @Produces annotation. The marshalling to JSON is handled automatically, but we have to specify the marshaller dependency in the pom.xml file, as follows:

<dependency>
	<groupId>com.fasterxml.jackson.jaxrs</groupId>
	<artifactId>jackson-jaxrs-json-provider</artifactId>
	<version>2.3.0</version>
</dependency>

There are two methods which include the /add path, but they accept different numbers of inputs so there is no conflict. The class also contains an @Inject annotation, which tells the container to inject a dependency into the callHandler field. If we look back at the SetApplication class, we can see that this value is injected by registering it with the application through the register() call:

register(new AbstractBinder() {
	@Override
	protected void configure() {
		bind(setCallHandler).to(SetCallHandler.class);
	}
});

Unit Tests

I’ve written two unit test classes. One, SetApiTests provides what are essentially end-to-end integration tests, which call the API and check that its operations perform as expected. The second, MockedSetApiTests provides an example of using mocking to test just the API calls themselves. Both test classes extend JerseyTest, which handles the heavy lifting of setting up a servlet container and providing the API. To run correctly, JerseyTest requires a test framework provider, which is the servlet container used to run the test. In this example I’ve used the jetty container, where the dependency is specified in the pom.xml class with the following code:

<dependency>
	<groupId>org.glassfish.jersey.test-framework.providers</groupId>
	<artifactId>jersey-test-framework-provider-jetty</artifactId>
	<version>2.6</version>
</dependency>

In the MockedSetApiTests class, the SetCallHandler, which manages the logic behind the API call (and is injected), is mocked out:

@Override
protected Application configure() {
	setCallHandler = Mockito.mock(SetCallHandler.class);
	return new SetApplication(setCallHandler);
}

The configure call is required by JerseyTest to properly configure the application, which in this case requires us to pass the mocked dependency so that it can be injected into the SetResource. In the tests themselves, the call to the API is relatively simple:

@Test
public void mockedAddCall() throws InvalidRequestException {
	final String value = "MyTest1";
	final boolean returnValue = true;
	when(setCallHandler.add(value)).thenReturn(returnValue);

	final Response responseWrapper = target("set/add/" + value).request(MediaType.APPLICATION_JSON_TYPE).get();

	assertEquals(Response.Status.OK.getStatusCode(), responseWrapper.getStatus());
	assertEquals(returnValue, responseWrapper.readEntity(BOOLEAN_RETURN_TYPE));
}

This makes the API call and returns the result (whether it is a success or failure to the responseWrapper). This can then be queried to establish whether the call was successful (by getting the HTTP response code):

assertEquals(Response.Status.OK.getStatusCode(), responseWrapper.getStatus());

If successful, we can then obtain the returned value:

assertEquals(returnValue, responseWrapper.readEntity(new GenericType<Boolean>() {}));

The test addMultipleSingleCall shows an example of an API request with a message body (in this case a set of Strings), also showing how to package up this request parameter in the test:

@Test
public void addMultipleSingleCall() throws InvalidRequestException {
	final Set<String> valuesToStore = new HashSet<String>(Arrays.asList("a", "n", "g", "u", "s", "m", "a", "c"));

	final Entity<Set<String>> requestBody = Entity.entity(valuesToStore, MediaType.APPLICATION_JSON_TYPE);
	target("set/add").request(MediaType.APPLICATION_JSON_TYPE).put(requestBody);

	checkGetCallResponse(performGetCall(), valuesToStore);
}

The examples in this post don’t include any parameters that are non-standard Java types, but doing so is relatively simple. By default only the public fields in the class are serialized, and a default constructor is required, but the class doesn’t have to implement serializable.

How To Run

To run this example in Eclipse for Java EE:

  1. Download / clone the code from GitHub.
  2. In Eclispe, go to File -> New -> Java Project
  3. Untick ‘Use default location‘ and navigate to the path of the jersey2-example repository.
  4. Re-tick the ‘Use default location‘ option, which sets up the project name as jersey2-example.
  5. Click finish to create the project.
  6. To run, either run the unit tests in JUnit, or right-click on the project and select Run as -> Run on Server.

If you want to create your own eclipse project, you can follow this example, but note that this is for Jersey v1, so you need to adjust the options used in steps 3 and 5 (i’d compare them to the example in my GitHub repo).

To run standalone (the previous steps are required):

  1. Right-click on the project, Export -> WAR File.
  2. Set the location to store the WAR file, and change the specified server runtime if necessary.
  3. Run the WAR in your favorite application server, or standalone with Jetty Runner.

Additional Resources

The following links are resources I found useful in writing this example. Where the examples refer to v1 of Jersey I’ve said so — examples are included because some part of them is useful, but be careful to note places where v1 specific code is used. This includes anywhere where a com.sun.jersey namespace is used.

January Cleanweb Meetup At AetherWorks

We’re excited to be hosting this month’s CleanwebNYC meetup at AetherWorks! The event kicks off Tuesday, January 28th at 6:30pm at the AetherWorks office, located at Bryant Park in Manhattan. We’ll have pizza and beer, and anyone is welcome! RSVP here.

If you’re not familiar, Cleanweb leverages technology to tackle resource problems in energy, water, food, waste, transit and beyond. Check out the CleanwebNYC meetup page for more information on their monthly meetings and how you can get involved, or cleanweb.co to see how the Cleanweb initiative is addressing resource challenges on a global scale.

There will be two presentations Tuesday on solutions that use computer hardware resources more efficiently:

AetherStore

  • We’ll be presenting the storage software we’re developing, AetherStore, which turns unused space on machine hard drives into a shared storage network. AetherStore requires no new hardware and allows organizations to use their existing storage space more efficiently.

Revivin

  • Revivn re-purposes unused technology for social and environmental impact, creating “a new and greater purpose for unused computers.” They use outdated electronics from companies to build out various initiatives helping people gain computer access.

It should be a fun night, so join us to see what’s going on in the CleanwebNYC community – and of course to enjoy some free food and drinks! RSVP now! 

Secret Santa’s Little (Software) Helpers

We decided to do Secret Santa this year at AetherWorks, and we were all pretty excited about it. We put everyone’s name on a piece of paper, threw them in an empty granola bar box, and everyone picked.

I chose last, and in an unfortunate twist, chose my own name. Undeterred, we threw the names back in and tried again. This time Mike picked his own name. It wasn’t until the third drawing that we all managed to pick someone else, and by then we were frustrated by how difficult a simple Secret Santa exchange was for us to execute. How many software engineers does it take to pull off a gift swap?

Set on absolving the world of the inefficiency that is drawing names out of a hat, the development team told me they could “easily” write a program to perform this task. Each of them wrote a program in a different language that assigned gifters and recipients at complete random, such that no one would be buying for themselves.

Most of them didn’t get it exactly right.

You can find all of our solutions on our GitHub page here. 

Our Solutions

See what they came up with here, and read on for Mike Zaccardo’s explanation of where the others went wrong. Use Mike’s code at the end of this post (the one that does it right!) to set up your own gift exchange and save yourself the frustration of a flawed Secret Santa drawing.

“At first, we each implemented solutions like this:

  1. Randomly shuffle the list of participants
  2. Each participant receives a gift from the participant that precedes them in the shuffled list
  3. Each participant buys a gift for the participant that follows them in the shuffled list

Unfortunately, this algorithm has a flaw – it is unable to generate every possible valid combination of assignments. Specifically, the algorithm cannot produce assignments with loops (where person A has to buy for person B, and person B has to buy for person A), as each participant receives a gift from the preceding participant in the shuffled list and gives a gift to the following participant.

After some thought, I implemented a solution that requires more iteration than my colleagues’ but is able to generate every possible valid combination with a uniform likelihood. The algorithm can be generalized like this:

  1. Copy the participants into two lists: buyers and receivers
  2. Randomly shuffle receivers; do nothing to buyers
  3. Check the value at every position in buyers and make sure that the corresponding value at the same position in receivers is not the same (the buyer is not buying for him/herself)
  4. Go back to step (2) if the check in step (3) failed; otherwise we’re done!

Step (2) is equally likely to produce each of the possible permutations, but (3) filters them to only the permutations where each buyer does not buy for him/herself. These types of permutations are known as derangements. The algorithm is therefore equally likely to produce each of the possible derangements, the valid sets of assignments!

As I studied the concept of derangements, I discovered that when choosing Secret Santa assignments, regardless of the number of people involved, there is approximately a 63.2% chance that someone will choose him/herself. To see the derivation, click here. Given that probability of failure, it now makes total sense that we needed three attempts to successfully choose assignments.

When you really think about it, my algorithm is fundamentally the same as what our office did manually, just performed really quickly on a computer – pick randomly until a derangement is found. So I guess it turns out that the correct method really is to just pick from a granola bar box!”

Mike

You can find all of our solutions on our GitHub page.

AetherStore Spiceworks Beta

This past Tuesday we launched a beta with Spiceworks where we have 29 Spiceheads testing AetherStore and reporting back to our private panel.

If you’re not familiar with the Spiceworks community, it’s 4 million IT pros and tech vendors swapping IT knowledge. Spiceworks also provides a service through which you can conduct a private beta with a number of IT pros that are interested in your technology. The Spiceheads on your panel test and provide feedback in a private forum on basically every aspect of your product.

This beta is a big milestone for us, and already the feedback has been really helpful. Once the panel is complete we’ll post an update to share what we’ve learned. In the meantime, whether you’re part of our Spicepanel or just want to learn more about AetherStore, here’s a look back at a few blog posts we’ve written that explain some AetherStore Beta fundamentals:

Or if you’re looking for a slightly more technical read:

Stay tuned for an update on our AetherStore Spiceworks Beta in a of couple weeks. We’re also looking forward to getting this release out to our Early Adopters who have been crucial in getting us this far! If you’re interested in testing AetherStore for yourself, check it out here!

Cradle to the Gravy: Thanksgiving and Your Data

We enjoyed writing our Halloween-themed blog post so much we couldn’t resist putting a Thanksgiving spin on this one as well. On the menu for today: turkey and storage backup.

Our Director of Research, Angus Macdonald, was telling us about The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb. The book gets at our inability to account for what we haven’t experienced. The namesake example is the discovery of the first black swan in Australia, and how that discovery annihilated what was an “unassailable belief” by the Old World – that all swans were white.

“It illustrates a severe limitation to our learning from observations or experience and the fragility of our knowledge. One single observation can invalidate a general statement derived from millennia of confirmatory sightings of millions of white swans. All you need is one single (and, I am told, quite ugly) black bird.”[1]

How to Learn From the Turkey:

Taleb further explains the phenomenon using the life of the turkey, which is fed and cared for every day of its life by humans. The turkey has no reason to distrust them, each day the belief that its caretakers strive only to keep it alive is reaffirmed. The reliability of this setup seems to increase with each day, when actually slaughter is becoming increasingly more imminent.

Enter: Thanksgiving.

Turkeypic
Figure One: One Thousand And One Days Of History (Taleb, 40)

As we can see from the life of the turkey, what has worked in the past is often not an indication of what will work in the future.

So it is with data storage, until you’ve experienced data loss it’s easy to be lulled into a sense of security by a storage infrastructure that has given no indication it will fail. In reality, each day without incident may be one day closer to data loss. A Google study found that, on average, nearly 15% of all hard disks fail within two years, 22% fail within three, and 35% of disks fail within five. [2]

Keep in mind disk failure is only one way to lose your data, these statistics don’t even cover the countless other ways to experience data loss. And the consequences are dire: “93% of companies that lost their data center for 10 days or more due to a disaster, filed for bankruptcy within one year of the disaster.” And “94% of companies suffering from a catastrophic data loss do not survive – 43% never reopen and 51% close within two years.”[3] The odds aren’t much better than a turkey’s on Thanksgiving eve.

Fortunately IT experts do have an advantage over turkeys, the ability to learn from others’ experience. Still, even with statistical evidence supporting the need for data backup, when IT budgets are low it can be understandably painful to invest in something with seemingly no immediate gratification. In this Spiceworks study, 30% of SMBs surveyed by Spiceworks admitted they aren’t allocating enough resources to backup.[4]

Taleb addresses this incongruity by referencing Bertrand Russell’s “Problem of Inductive Knowledge” – “How can we logically go from specific instances to reach general conclusions?” We know data loss it is a possibility, but it’s easy to feel secure when your storage setup shows no signs of weaknesses, and frustrating to invest in a something that appears to offer no instant gratification.

Still, the evidence is not on your side. Data loss is a likely possibility and its consequences are daunting.

Don’t be a turkey, back up your data. Happy Thanksgiving from AetherWorks!


[1] Nassim Nicholas Taleb, “The Black Swan” Penguin Books Ltd, 2007, 2010

[2] Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz André Barroso. Failure Trends in a Large Disk Drive Population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies, February 2007

[3] Unitrends, “7 Shortcuts to Losing Your Data (and Probably Your Job)”

[4] Deni Connor, Spiceworks Voice of IT sponsored by Carbonite, “How SMBs are Backing Up: Solutions, Trends & Challenges” March 2013 http://itreports.spiceworks.com/reports/spiceworks_voice_of_it_backingupsmb_031112.pdf

Storage Capacity That Increases With Your Demand

Over recent years we have seen the growth of three seemingly orthogonal topics: green-computing, small tech businesses, and data analytics. This has led to the rise of simple local file storage and sharing applications that provide easy access to critical business data, while minimizing cost.

As businesses grow in size, they employ more people, purchase more workstations, and see an overall increase in demand for storage capacity.

These businesses can either provision a server, which is subject to sawtooth-like changes in utilization as demand increases, or they can use a cloud or local storage synchronization application.

Most of these local storage applications require data to be stored on all machines running the storage application. This conflates the issue of backups (multiple copies required for redundancy) and availability (data is available from all machines). If all machines have to store a copy of the data, then when a new machine joins it must have enough space to store all of the existing data, which means that machines with small hard disks may not be able to join and access all data. When you add a new machine to your system it doesn’t increase your capacity at all, even though your demand is increasing.

demand vs capacity
Figure 1: While demand increases with number of machines, server capacity is a step function. The local capacity is almost entirely ignored in favor of server storage.

What if it were possible to have copies of data available on all machines, while not requiring all machines to keep a copy? This would allow us to support machines with low capacities as well as high capacities, and would ensure that our storage capacity increases as the number of machines available increases.

With AetherStore, we can do this.

Rather than storing all files on every machine in the local file system, AetherStore creates a virtual network drive, which allows all machines to access all data without having to have it all on any single machine. When a user saves a file to the AetherStore drive, the file is backed up across a number of machines on the local network.  AetherStore ensures that there are multiple copies of the data for each file available, but it stores these copies on a subset of all machines rather than across every one.

AetherStore abstracts the physical location of file data so all machines in the local AetherStore network can see the files without necessarily having to store a local copy. AetherStore’s storage scales linearly with the number of machines in the network, so more machines means more storage capacity. Your storage capacity grows with the size of your business!

AetherStore splits files into chunks and store these chunks across the members of the local AetherStore network. The current default number of machines to replicate data onto is four, though this is customizable. The disparity between the visibility of files on all machines and the locality of files split up across a subset of all machines is what allows AetherStore to scale efficiently, so what does this mean for your available storage capacity?

The following graph shows “the storage capacity vs. number of machines” and compares AetherStore to “Full-Copy” systems that require all machines to store file replicas.

combinedcapacity
Figure 2: The storage capacity available on each machine is the same in both systems. AetherStore’s capacity grows linearly with the number of machines. When using AetherStore, your storage capacity grows with your business.

Taking Advantage of All Capacity

Each machine in an AetherStore network can have a different sized hard disk and a different amount of capacity allocated for use by AetherStore. For example, consider six machines running AetherStore on a local network. For the sake of our example, each node is able to store some number of units of data. We can visualize our example AetherStore network:

UsageStats-1-Empty
Figure 3: Not all of the machines have the same capacity of available units. Each cylinder represents one unit of storage space. Note no data is stored at this point.

Let’s now save a small file to AetherStore. In this example we assume each file saved takes up a whole unit of storage. In our example network, there are 23 units worth of storage across all machines in the system. We know that we have to have several copies of data, so if we save a single unit file to AetherStore, we have four replicas saved across the system.

UsageStats-2-OneFile
Figure 4: After saving a one unit file, we now see that the whole takes up four units across the system. Note that the algorithm does not allocate more than a single replica per machine for redundancy’s sake.

Figure 4: After saving a one unit file, we now see that the whole takes up four units across the system. Note that the algorithm does not allocate more than a single replica per machine for redundancy’s sake.

We should remember that because files are split into chunks which are replicated four times, we can therefore view each chunk as “costing” (in terms of space required) four times its actual size.

So how much data can our system store? Let’s save two more single-unit files to AetherStore:

UsageStats-2-ThreeFiles
Figure 5 It is easier to grasp AetherStore’s data allocation after three files have been saved.

In a system requiring a full copy of all data, the machine with the smallest capacity would be unable to access every file!

Now that we have seen the advantage of this form of data allocation, let’s make the leap to using real units. If a one unit file requires four units of storage in an AetherStore network, a 1MB file “costs” 4MB to store.

If this file requires a copy on every machine in a full-copy system, a 1MB file would cost 6MB in the above example. Moreover, once the capacity is full on a machine, that machine can no longer view all of the files. The full replica system uses more of your space and then limits access once a machine’s capacity is reached!

The short equation for the capacity of your AetherStore is:

Local store capacity = Space allocated / 4

The same equation applies to all nodes in the network.  Thus the capacity of the total AetherStore system is:

Total AetherStore system capacity = Sum of allocated space across all nodes / 4

Remember that behind the scenes AetherStore breaks files into chunks, so our example was both a visual metaphor AND a model of AetherStore itself!

In practice, AetherStore is more complicated than this because we utilize caching on all nodes so that repeated access of a file only makes local requests, reducing network traffic and increasing speed (discussed in a previous post).