For much of the last 25 years, Computer Scientists have looked for ways to make use of the unused resources of workstation machines. During this time the capacities of machines have greatly increased, as have the number that are available and under-utilized. But while the focus of this work gradually shifted from local area networks to web-scale systems, the last few years have seen a shift back to the small scale. Here’s why.
The Workstation Revolution
In the late eighties, a group of researchers from the University of Wisconsin were trying to parallelize computation over workstation machines.
They observed large collections of machines, connected but under-utilized, and many users whose demand for computation far exceeded their current capacity. If the unused capacity could be harnessed, it provided a greater degree of parallelism than was previously available, and could do so with existing resources.
Their approach worked, and the resulting system, Condor , is still used today. Indeed, the two fundamental problems of distributed desktop computing are the same now as they were 25 years ago:
The desktop is run for the benefit of its interactive user, so background processes shouldn’t affect that user’s activities.
The desktop is unreliable, and can become unavailable at any time.
Condor was designed such that computations could be paused, making it possible to minimize impact on user activity during busy periods. It was also designed such that computations could be re-run from a previous checkpoint, ensuring the failure of a machine didn’t critically impact results.
It created a pool of computational capacity which users could tap into, which was perfect for local-area networks, because long running computations could be run on remote machines and paused whenever the machine was in use. Since these machines typically resided within a single organization, administrators could ensure that the machines rarely became unavailable for reasons other than user activity.
The Internet Generation
The nineties saw an explosion in the number of workstations that were available, connected, and unused. With the advent of the World Wide Web, the problem of resource utilization had gone global.
The characteristics of this network were vastly different to the LANs used by Condor. Most importantly, with machines outwith the control of any one organization, they were less reliable than ever. Worse still, results could not be trusted. For example, a user with a home workstation may have had a lot of unused capacity, but unlike the lab machines typically used by Condor, there was no guarantee that this machine would be available for any length of time. Even if it was available, with the machine being at a remote site, we now had to consider that results could be faked or corrupted.
More positively, there were now so many of these machines available that any system able to use a tiny fraction of their resources would be able to harness great computational power. In fact, there were now so many machines available that systems such as Distributed.net  and Seti@Home  could take advantage of redundant computation.
Both systems broke computations down into relatively small problems, and sent duplicates of these problems to many machines. The problems were small enough that even a short period of activity would allow the remote machine to complete, and the number of duplicate computations was great enough to ensure that (a) some of the machines would eventually finish the computation, and (b) enough of these machines would finish that their results could be compared, ensuring that corrupted results were ignored.
Forging beyond computation, P2P systems such as Napster  and BitTorrent  allowed people to use the storage capacity of their workstations to share and distribute data. Like Seti@Home did for computation, the key to these systems was that redundant copies of data were stored across many machines, meaning the system’s operation was not dependent on the continued availability of only a few workstations.
Until recently, this was as far as these systems came. But the ubiquity of multi-machine home networks and an ever increasing demand for storage has created a new focus — storage on local area networks.
Napster and BitTorrent work well on a large scale, when the sheer number of users makes it unlikely that popular items will be inaccessible, but poorly on a small scale where they provide no guarantees that all data will be backed up.
Workstations and LANs now have the capacity to be used for storage, without affecting the activities of the user of a machine (problem 1). New storage systems are capable of ensuring that there are always enough copies of data to prevent data loss (problem 2).
Companies such as AetherStore (disclaimer: this is us), AeroFS , and BitTorrent Sync  are working to make use of this under-utilized storage space.
Why do we do this? Many small companies have storage needs, but no desire for a server (and regulatory compliance issues with the cloud), while many others have one or two servers but no redundancy. The ability to provide backup without additional capital expenditure makes that additional backup or server a much more realistic prospect.
Resource utilization has gone local again. This time the focus is on storage.
At AetherWorks, we leave the office every day to get lunch from one of the many dining establishments in the vicinity of our Bryant Park HQ, and we probably spend most of this time stuck in line.
Different restaurants offer different queue strategies, and unfortunately, all queue strategies are not created equal. But which of the restaurants offers the most efficient queue, so that I can eat there more frequently and waste less time stuck in line?
This post compares the performance of the queues at our favorite lunch spots using a branch of mathematics known as queuing theory.
Generally we can think of purchasing food in a take-out restaurant as involving a series of workstations, each with a separate task that takes some time (e.g. ordering food, instructing workers, collecting food, paying). These stations are generally attended in sequence, and each station takes some time to process one customer. The sequence of stations is known as a pipeline.
When a customer is waiting to be served by a workstation (because the workstation is still dealing with the customer in front), they are idling, which counts as wasted time. Note that idling is distinct from waiting for a station to complete the task associated with your order. For example, processing your payment constitutes an action toward your food order.
I am interested in the proportion of time wasted relative to the time being served, as opposed to the total amount of time wasted.
There are several properties of a queue that we can measure:
Total time in the shop
Time spent waiting for service (that is, in queues between workstations)
Number of people waiting in the queue
Total number of people waiting in the queue
Utilization of workstations
Interestingly, a high utilization results in an experience with high waiting times for the customer. Thus, the queue functions better, from the customer’s perspective, with a higher capacity and servers who are not as busy.
If the proportion of time spent in the queue is high, the restaurant could do things better and improve their system. You have to balance queue efficiency, however, with the cognitive abilities of customers — some queuing systems are so complicated that people can have a hard time figuring out where to go next!
Take-out restaurants have several different approaches to their queuing systems:
Single channel / single station: buying a cupcake
Select a cake and pay, all the while dealing with the same staff member
Single channel / multiple stations: fast food drive through
Multi-channel / single station: Toll booth
Multiple stations each with a separate queue
Multi-channel / multiple stations: Lunch deli
Multiple types of food, each from a separate counter
Multiple tills, each with a separate queue
Each restaurant will generally follow one of the above approaches, with the workstations arranged such that a customer moves from one queue to the next. Using a set of known equations derived from the behavior of queues, it’s possible to analyze the properties of a given system of queues. In this analysis I use what is known as an M/M/c queue.
An M/M/c queue assumes that the customer arrival rate and job completion rate follow a Poisson distribution, and it also accounts for the number of servers at each station. For example, multiple baristas may be making coffee simultaneously for a single queue of waiting customers.
I won’t go into the detail about the equations here, but you can read the linked articles and look at my code provided. The remainder of this post looks at the queuing systems for several of our favorite restaurants. For each restaurant I will list the stations and their respective processing times. Each station has its own queue and a number of servers associated with it (assume one server unless explicitly stated otherwise).
Chipotle, Subway, and Starbucks all have queuing systems through which there is only one path. All customers follow the same path.
Chipotle (burrito restaurant)
There is one pipeline, and each customer must wait for the previous customer to finish at a station before progressing to that station.
Burrito/bowl/tacos – 5 seconds
Rice and meat – 8 seconds
Salsa and salad – 12 seconds
Wrapping and pricing – 9 seconds
Paying – 10 seconds
With this model, someone can block the entire chain; which is most noticeable if someone pays by check or cash which can hold up the line.
Subway (sandwich restaurant)
Subway is similar to Chipotle in that there is a single pipeline. The main difference, however, is the presence of a toaster. Customers can opt to have their sandwich toasted, which adds 30 seconds onto the service time, but can overtake earlier customers if they opt out of having their sandwich toasted. I have estimated the probability of using the toaster as 80%.
Choice of bread and slicing – 8 seconds
Meat and cheese selection – 17 seconds
Toaster – 32 seconds – probability of use 80%
Salad and dressing – 17 seconds
Paying and wrapping – 15 seconds
Starbucks (coffee shop)
Different Starbucks shops use different models, but in this analysis I assume that there is a single employee who operates the till, taking payments, and that they are backed up by three baristas who make coffee in parallel, fulfilling the orders.
Order and pay – 22 seconds
Wait for coffee – 92 seconds – 3 servers
Multi-pipeline queue system
Some restaurants allow several parallel queues of customers, which includes cases where customers can switch between stations in chains of queues. This means that we have to take into account multiple queues of customers, as opposed to the previous section that had single queues being serviced by multiple employees.
McDonalds provides several queues in parallel, the first for ordering and paying, and the second, an (invisible) station where customers wait while their food is gathered and served. The time it takes to cook the food is accounted for in the time taken to gather the food items.
Order and pay – 42 seconds – 5 parallel queues
Wait while food is gathered – 102 seconds – 5 servers
Chop’t (Salad restaurant)
Chop’t is a restaurant providing takeout salads. They use the most complicated queuing strategy of all the chains analyzed here. First, customers wait in a line to be placed into one of several queues for workstations that will place ingredients in a bowl. Second, the customer is placed into one of several queues for having their ingredients chopped and a dressing applied. Finally, customers queue in a single line for one of several payment stations.
Chop ingredients and apply dressing – 47 seconds – 6 parallel queues
Pay for salad – 17 seconds – 3 servers
In a single pipeline system, it is good to minimize the amount of time at each station. For customers it is better to ensure stations don’t block getting their items (i.e. waiting to pay at Chipotle causes your food to cool down). In this sense, Starbucks has a better approach, since they take your money first. Many places can’t do this because customers can customize their order while in the pipeline. For a simpler order like coffee, it makes sense to order and pay at the beginning, but is much harder at somewhere like Chop’t.
To compare the stores with a theoretical analysis we need to provide a common arrival rate of customers, so the results below use a customer arrival rate of 1 customer every 200 seconds. This is generally unrealistic, but if the time taken to serve customers is greater than the arrival rate, the queue size tends to infinity and the equations will not produce a result.
According to this analysis, you can see that Subway wastes the largest proportion of its customers’ time, and that McDonalds is the most efficient.
However, this doesn’t account for cases such as a limited initial queue size (because people won’t queue if the line is too long), or where a customer leaves the restaurant halfway through the pipeline. For this more advanced behavior we need to run a simulation.
The theoretical model is under-powered due to the low arrival rate required for a direct comparison of restaurants — customers arrive at too low a rate to allow for meaningful comparison of waiting times.
A simulation allows us to perform a much deeper analysis of these queues, including stopping customers from entering the shop if the wait time is too long. For this analysis, if the wait time of customers is larger than 6 minutes, customers do not join a shop’s pipeline.
I simulated 10,000 customers arriving at the restaurants with the same arrival distribution, assuming an arrival rate of one customer every five seconds.
Average Waiting Time (seconds)
Average Service Time (seconds)
Proportion of Time Wasted
In these results the Chop’t queue is the most efficient, as a lower proportion of time is wasted. This is most likely because customers can swap between several queues, rather than being funneled down a single pipeline open to blocking. In theory, however, a single queue, multi-server model is the most efficient.
With this knowledge, is it possible to improve the queues of any of the test restaurants?
For example, if we assume that the Subway sandwich toaster is blocking customers, what happens if we add more toasters into the pipeline?
Average Waiting Time (seconds)
Average Service Time (seconds)
Proportion of Time Wasted
Subway 1 Toaster
Subway 2 Toasters
Subway 100 toasters
If we have 100 toasters, the number of customers served has increased to 246, but the average waiting time has also increased. This happens because the current waiting time is calculated when a customer leaves the store, meaning the current calculation is equivalent to asking people as they leave: “How long did you spend waiting”? It is not a calculation of the waiting time of people currently in the pipeline, as not all customers are at the same stage within the pipeline. Thus, because the toaster is the largest time sink, eventually it fills up to maximum capacity and blocks, increasing waiting times. A higher number of toasters allows more customers to enter the shop before the average waiting time is over 6 minutes, meaning that these customers end up waiting longer than they otherwise would have with a single toaster. It’s probably not in Subway’s best interest to run the extra toasters for the marginal improvement in customer turnover.
The models I have used in this study do not necessarily reflect the actual times taken in the store, but my estimates are based on experience and are roughly proportionate. If you disagree, feel free to play with the numbers in the code and let us know what you find. It’s also worth noting that these models allow large queues to develop at workstations, which is unlikely to occur in real life.
I’d be interested in seeing some meta-analysis on the economics of the potential increase in revenue from introducing additional workstations, given a particular customer arrival rate and the cost of the additional workstations.
As always, please feel free to comment below with suggestions or thoughts on improvements to the system.