Security vs. Privacy in the Modern World

This week we welcomed our newest software engineer, Isabel Peters, to AetherWorks.  In this post she discusses the tradeoff between security and privacy, an area related to her recent Masters project at Imperial College London. She previously studied at the University of St Andrews.

As we store more information digitally, the dangers of data theft are increasingly serious. Do you know how your data is secured?

We live in a digitized world where almost every kind of transaction requires some form of computational data handling. For example, virtually any online purchase requires a user to enter personal information including their name, home address, and credit card information. This information is often necessary – whether for online purchases, for gaining access to restricted buildings, or for border control.

The movement of data into a digital format in some ways allows companies to greatly improve our lives, as we can process information more effectively and efficientlyOn the other hand, this also means that our privacy can be breached in many new ways, such as with identity fraud, Internet scams and other types of cyber-crime.

The threat from cyber-crime requires new levels of security. To increase protection of sensitive data sent through the network, security weaknesses have been fixed and message encryption techniques have been added to standard protocols, though they still need to be revised and updated periodically, such as with IPSec and TLS.

In the courts, extensive digital disclosure of personal data has emerged as a new challenge to the notion of privacy. The main issue is that technology is fast and the law is slow’. For example, the term ‘privacy’ is nowhere to be found in the US Constitution to date and non-disclosure policies are mostly completely voluntary for organizations. This is a dilemma for the consumer who wants to use online services but may be subject to unwanted collection and usage of personal data. For these reasons many people are wary of storing data in the cloud.

How is your data secured?

There are different types of protection mechanisms for information stored on a computer or sent through a network. First of all, data that is stored on disk or sent through a network can be protected with cryptography. This is the application of mathematical techniques to design ciphers and encrypt plaintext in such a way that it is hidden to anyone without the cipher key attempting to read it. The degree of security largely depends on the ‘strength’ of the cryptographic cipher, which is determined by both the cryptographic algorithm and the size of the cryptographic key.

For instance, Triple-DES encrypts each data block three times using DES (Data Encryption Standard) and  was initially believed to be sufficiently secure using a key-size of 56 bits. However, the US Government now recommends key-sizes between 80 to 120 bits because the previous version became vulnerable to brute-force attacks as a result of increased computational power. Other encryption algorithms currently approved by the by the US government (National Institute of Standards and Technology) include SKIPJACK (80 bits) and AES (128, 192, 256 bits).

Ultimately the factors that determine the optimal choice of the encryption technique and the size of the key depend on its purpose, the sensitivity of the data, and also on the computational resources and available time of the organization and/or individual who will use it [1].

How is your data accessed?

If a user has managed to encrypt their data, they still need a means of accessing it, while still ensuring that no-one else is capable of doing the same thing. Access to data can be managed through various mechanisms. For example, a database can be restricted by an authentication system that uses a type of ‘authenticator’. Authenticators can be:

  • Knowledge-based, such as a password or a ‘secret’ question such us “What is your pet’s name?
  • Object-based, such as a physical possession of a token that grants access to a resource such as a metal key to an apartment
  • ID-based, referring to authenticators that are unique to the individual such as a passport. For example, a special type of ID-based authenticator is an individual’s biometric, such as the human iris.

What makes biometrics extremely secure is its complete uniqueness to every human individual [5], but many people have doubts about such technology because it requires them to give away a very personal, unique, and unchangeable identifier. Whereas a compromised password can be reset, a stolen biometric is irreplaceable, making it particularly sensitive and vulnerable. This represents a key trade-off between privacy and security.

The dilemma: Privacy vs. Security

As attackers become more sophisticated and begin to understand security systems, and computational power increases, our security requirements must become more stringent. Recent trends show that people are relinquishing more private information in favor of security and self-protection [6]. Consequently, the continuously growing connectivity and the need for data storage are in constant and increasing conflict with people’s privacy concerns. This constitutes a predicament highlighted by President Obama: that “We can’t have 100% security and also then have 100% privacy and zero inconvenience” [4].

“We can’t have 100% security and also then have 100% privacy and zero inconvenience” – Barack Obama.

What are the implications of this shift?

As the demand for personal information increases, people will face increasingly stark choices between security and privacy. As a user of a system it’s important to figure out what is important to you.

New storage solutions must be designed to store sensitive information, and to follow the laws and standards on security and data encryption of the present day. Despite this, legal frameworks are still lagging behind, and there is an urgent need for new laws and standards that restrict the voluntary collection and abuse of user data, and protect users from the illegal distribution of such information on the internet.

However, we also have to accept and understand that in order to provide a certain level of security (e.g. at the airport) sensitive data must be captured and stored. In recent years interesting projects, such as the PrimeLife [2] project, have emerged, introducing new concepts and developments with regard to privacy and identity management, but there is still work to be done [3].

There is an onus on individuals to understand and accept how their data is protected, and on companies to produce products with an adequate degree of protection against modern threats to data security.

 [4] http://rt.com/usa/obama-surveillance-nsa-monitoring-385/

[5] The iris, for instance is one of the most widely deployed biometrics due to the stability of the human-eye over a life-time, its good protection from the environment and most importantly, its great mathematical advantage, given its excess of up to 266 degrees of freedom (the number of parameters that may vary independently).

[6] In a survey performed by the Joseph Rowntree Reform Trust 65 % of the respondents said that collecting information about citizens on large computer systems is a bad idea. 83% did not approve the access to phone, mail and Internet browsing records by the government (source).

 

Workstation Resource Utilization Goes Local (Again)

For much of the last 25 years, Computer Scientists have looked for ways to make use of the unused resources of workstation machines. During this time the capacities of machines have greatly increased, as have the number that are available and under-utilized. But while the focus of this work gradually shifted from local area networks to web-scale systems, the last few years have seen a shift back to the small scale. Here’s why.

The Workstation Revolution

In the late eighties, a group of researchers from the University of Wisconsin were trying to parallelize computation over workstation machines.

They observed large collections of machines, connected but under-utilized, and many users whose demand for computation far exceeded their current capacity. If the unused capacity could be harnessed, it provided a greater degree of parallelism than was previously available, and could do so with existing resources.

Their approach worked, and the resulting system, Condor [1], is still used today. Indeed, the two fundamental problems of distributed desktop computing are the same now as they were 25 years ago:

  1. The desktop is run for the benefit of its interactive user, so background processes shouldn’t affect that user’s activities.
  2. The desktop is unreliable, and can become unavailable at any time.

Condor was designed such that computations could be paused, making it possible to minimize impact on user activity during busy periods. It was also designed such that computations could be re-run from a previous checkpoint, ensuring the failure of a machine didn’t critically impact results.

It created a pool of computational capacity which users could tap into, which was perfect for local-area networks, because long running computations could be run on remote machines and paused whenever the machine was in use. Since these machines typically resided within a single organization, administrators could ensure that the machines rarely became unavailable for reasons other than user activity.

The Internet Generation

The nineties saw an explosion in the number of workstations that were available, connected, and unused. With the advent of the World Wide Web, the problem of resource utilization had gone global.

The characteristics of this network were vastly different to the LANs used by Condor. Most importantly, with machines outwith the control of any one organization, they were less reliable than ever. Worse still, results could not be trusted. For example, a user with a home workstation may have had a lot of unused capacity, but unlike the lab machines typically used by Condor, there was no guarantee that this machine would be available for any length of time. Even if it was available, with the machine being at a remote site, we now had to consider that results could be faked or corrupted.

More positively, there were now so many of these machines available that any system able to use a tiny fraction of their resources would be able to harness great computational power. In fact, there were now so many machines available that systems such as Distributed.net [2] and Seti@Home [3] could take advantage of redundant computation.

Both systems broke computations down into relatively small problems, and sent duplicates of these problems to many machines. The problems were small enough that even a short period of activity would allow the remote machine to complete, and the number of duplicate computations was great enough to ensure that (a) some of the machines would eventually finish the computation, and (b) enough of these machines would finish that their results could be compared, ensuring that corrupted results were ignored.

Forging beyond computation, P2P systems such as Napster [4] and BitTorrent [5] allowed people to use the storage capacity of their workstations to share and distribute data. Like Seti@Home did for computation, the key to these systems was that redundant copies of data were stored across many machines, meaning the system’s operation was not dependent on the continued availability of only a few workstations.

Local Again

Until recently, this was as far as these systems came. But the ubiquity of multi-machine home networks and an ever increasing demand for storage has created a new focus — storage on local area networks.

Napster and BitTorrent work well on a large scale, when the sheer number of users makes it unlikely that popular items will be inaccessible, but poorly on a small scale where they provide no guarantees that all data will be backed up.

Workstations and LANs now have the capacity to be used for storage, without affecting the activities of the user of a machine (problem 1). New storage systems are capable of ensuring that there are always enough copies of data to prevent data loss (problem 2).

Companies such as AetherStore (disclaimer: this is us)[6], AeroFS [7], and BitTorrent Sync [8] are working to make use of this under-utilized storage space.

Why do we do this? Many small companies have storage needs, but no desire for a server (and regulatory compliance issues with the cloud), while many others have one or two servers but no redundancy. The ability to provide backup without additional capital expenditure makes that additional backup or server a much more realistic prospect.

Resource utilization has gone local again. This time the focus is on storage.

Footnotes

[1] Condor: A Hunter of Idle Workstations

[2] Distributed.net

[3] Seti@Home

[4] Napster

[5] BitTorrent

[6] AetherStore

[7] AeroFS

[8] BitTorrent Sync

Towards Pervasive Personal Data

This week we welcomed our newest software engineer, Lewis Headden, to AetherWorks.  In this post he discusses his Bachelors dissertation, “Towards Pervasive Personal Data”, which looked at issues in Distributed File Synchronization. This is an active area of research for his supervisor at the University of St Andrews, Dr. Graham Kirby.

Prior to AetherWorks, Lewis worked for Amazon Development Centre (Scotland) and spent a number of years freelancing – primarily developing web applications.

Pervasive Personal Data is the vision of a world where data flows autonomously to exactly where a user needs it.

Pervasive Personal Data is defined by intelligent systems taking proactive actions to attempt to maximize the availability and consistency of files even in a partitioned network. Systems learn what files the user wants and where they want them, and utilize every available connection to deliver it to the appropriate devices. The world of Pervasive Personal Data is the post-user configured, post-file synchronization world.

Why consider Pervasive Personal Data?

As the number of devices in use by the average consumer grows, the media they are consuming becomes fragmented across all their devices. Word processing files may be easily synchronized over the internet, but larger files – such as high resolution photos and video – struggle even on high bandwidth connections. These large media files are increasingly lacking in redundancy (or require a cumbersome backup process) and are often not available when people want them most. Yet if devices and machines worked autonomously to replicate this media then this problem would disappear. Files would be replicated on other devices, providing redundancy, and appear on the user’s other machines as they moved around, increasing their availability.

How is distribution of files in Pervasive Personal Data achieved?

Data transfer in such a system could be achieved in three ways.

First, the system could use the Local Area Network (LAN) to quickly transfer files. Obviously this is limited in usefulness, as the system cannot spread files outside of the network the device is in. Device portability, however, could help solve this. As laptops are often moved between home, work, and other locations, they could synchronize files to computers on all of the networks that they visit.

Second, the system could determine where it has seen high bandwidth connections and log them. By understanding past encounters with these connections, and predicting when it will be connected to them again, it can utilize the excess bandwidth to spread files without congesting the user’s connection. Of course, this is slower than using the LAN in most situations. However, Pervasive Personal Data is largely about synchronizing a user’s files for their personal use. In most cases, if the user is working with a device then it is likely to be connected either to the same LAN as another of their devices or to be used infrequently. In these cases, the system will either be able to quickly copy on the LAN or tolerate the slow transfer (as it is unlikely the machine will be used frequently).

Finally, the system can use a Sneakernet to distribute files. As devices like phones, tablets, and USB drives are moved between devices and/or networks, they all contain excess space that could be used to disseminate the user’s data.

How does a Pervasive Personal Data system learn preferences?

I previously introduced the idea that a system could make determinations about what files to synchronize. The simplest way to do this would be to bootstrap the system by asking the user questions about their preferences. These questions could include “I frequently look at recent photos at work”, “I do not want files created at work on my home network”, or “I do not watch my videos on my tablet”. Through this, some simple preferences could be worked out and used to determine the priority of data transfers.

Users could tweak these with more complex rules if they found that the desired data was not being synchronized.

On a larger scale, data about what files users actually interact with, and where, could be mined. Through this, a generic set of rules could be developed using machine learning, and adjusted over time with input from the user.

How are networks like “Home” and “Work” determined?

By building a network map based on the computers that a given device in the system interacts with, the system quickly partitions devices into distinct networks. Devices that travel between networks are also identified as key routes.

The system then labels these networks by combining user input with other information such as time of day or the types of modified files. Once it has determined the types of encountered networks, it uses that data to filter the files that are spread between devices.

What are the problems for Pervasive Personal Data?

There are a number of components that would make up such a system, and few of them are simple to engineer.  A starting point for thinking about these problems would be to ask questions. For example, how does a system:

  • Work out what to synchronize?
  • Decide which route (Sneakernet/LAN/WAN) to use?
  • Predict network availability and the transit of devices?
  • Deal with changing networks and new devices?
  • Simplify the configuration process for users but still meet their demands?

These problems, however numerous, are not insurmountable. A distributed storage system is a key part of the problem; analyzing the network and predicting availability is another. Furthermore, the system would need to learn user preferences in file usage, evaluate route selection when high latency paths like Sneakernets exist, and deal with the unpredictable whims of human beings.

This is an area of research that is still relatively unexplored, with some big challenges to overcome in order to deliver a solution that satisfies users.

Are you a good estimator?

Producing accurate estimations for software projects is notoriously challenging, but why? It all starts with understanding what it takes to make a good estimate.

What is a good estimate?

An estimate is an approximation of something, implying that it is based on uncertainty. Clearly a good estimate is accurate, but since this isn’t always possible, it’s more useful if it at least encodes how uncertain we are.

If I say that a project will be completed in 4 months, it removes an important piece of information — my confidence in the estimate. It’s unlikely that the project will take exactly 4 months, but  is it a low risk project which might take between 3-5 months, or is it based on so many unknowns that it could take over a year? The estimate isn’t more useful with a narrow range if it is based on little to no understanding of the problem.

This is the point made by Steve McConnell in “Software Estimation: Demystifying the Black Art”, where he argues that the illusion of accuracy can be more dangerous for project estimation than a wide estimate. If we can acknowledge that the estimate is not solid, then we can at least start to improve our knowledge of the problem and begin to make it more accurate.

“Estimates don’t need to be perfectly accurate as much as they need to be useful.” – Steve McConnell.

How good are your estimates?

Perhaps unsurprisingly, most people overestimate their own ability to make accurate estimations.

To show this, McConnell provides a test (which you can try for yourself here), where you have to estimate the answer to 10 questions with a 90% confidence that the correct answer is in the range of your estimation.

Try it, and come back here. How did you do?

Very few people answer these questions with 90% confidence, partly because we are conditioned to believe that a good estimate is a narrow estimate.

In fact, a lot of the comments on the answers page argue that the questions are poor, because you’d have to be an expert to produce any meaningful (accurate, narrow) estimates. But this is precisely the point!

If you can answer with 90% confidence, but with a very wide range, then you are at least acknowledging that you don’t have enough knowledge to accurately answer the question.

And that’s the first step to fixing the problem.

The Multicast Revival: LAN Multicast with Code Example

The rise of datacenters and cloud computing is bringing about a resurgence in the use of Multicast. Previously neglected by networking courses and confined to LANs, Multicast now has new relevance. As a brief introduction, this post covers the basic theory behind IPv4 and Ethernet Multicast and provides a code sample showing how it can be used in practice.

Multicast sits on a spectrum between Unicast and Broadcast (and can operate as Broadcast, but with a higher overhead). Unicast, point-to-point messaging is the dominant communication paradigm in networks today. In future networking environments, however, we may need to consider alternative communication paradigms.

When we think of computer networks, we tend to envisage point-to-point Unicast communication. For example, consider HTTP requests to a website: every host wishing to see the page makes an individual request for that page and receives an individual copy. Thus, if 1000 people wish to view the web page, 1000 copies are sent across the network. This duplication is acceptable for web-pages, as web content is generally static and web requests are created, and arrive, independently of one another[i].

In contrast, imagine a scenario in which requested data goes out of date quickly, or many hosts wish to receive the data simultaneously. Consider, for example, the live stream of a sports event or the delivery of timely stock market information to multiple hosts in a datacenter. In both of these cases, sending out multiple copies of the data is inefficient. Ideally we would address a message to multiple recipients and rely on the network to distribute the message accordingly.

Multicast

Multicast is a form of networking in which messages can be delivered to one or more recipients. The Internet Protocol (IP) is used to deliver data across network boundaries[ii]. We use IP addresses to identify a host across multiple networks (see IP header figure below).  If we note that an IP header only has enough space for one destination address, then how does Multicast provide support for multiple recipients[iii]?

xxx
Example Internet Datagram Header[iv]
The solution is to map a single IP address to multiple recipients.  Routers (and intelligent switches) have built-in support for maintaining these Multicast mappings. I will focus on behavior within a single LAN, rather than between routers.

When a packet destined for one of these Multicast addresses (which is mapped to multiple recipients) arrives at a router or switch, the packet is forwarded such that each link only receives one copy of the message at most. All hosts on a link receive all frames on that link and hosts simply filter out those frames not intended for them.

The key point is that as few copies as possible of the messages are sent. For instance, at the link layer, a frame only needs to be sent to links where group members reside[v].

IGMP

In IPv4, the Class D addresses 224.0.0.0 through 239.255.255.255 are reserved for Multicast. Each IP address represents a group of recipients, thus there is space for 248,720,625 different sets of recipients (Multicast groups). In practice, however, routers and switches generally do not have enough memory to support this number of groups simultaneously, and there are a number of reserved Multicast addresses used by various protocols[vi].

These Multicast mappings are maintained using the Internet Group Messaging Protocol (IGMP). When a recipient (host) wishes to join a group, it sends a membership report message to the “all routers” group (224.0.0.2). When the router receives the report, it maps the interface on which it received the report to the group IP address, and any messages sent to the group IP address are forwarded to the host via the specified interface. Thus, the router only needs to know of one group member on each interface.

The router periodically (roughly every minute) sends out a query to the “all hosts” group (224.0.0.1), to check whether there are any hosts still wishing to receive Multicast messages. All hosts in the network receive this query, but only those hosts wishing to remain members reply with a membership report. Hosts can also leave at any time by sending an explicit leave group message.

All IGMP messages are IP datagrams with the following format:

xxx
IGMP Packet Format[vii]
There are several types of IGMP message:

  • 0x11 = Membership Query:
    • General Query, used to learn which groups have members on an attached network.
    • Group-Specific Query, used to learn if a particular group has any members on an attached network.
  • 0x16 = Version 2 Membership Report: Used by hosts to declare membership
  • 0x17 = Leave Group: Used by hosts to leave a group

The Data Link Layer (Layer 2)

As with all IP traffic, Multicast relies on the data link layer (layer 2) for delivery. So far, I have only covered the network layer (layer 3) implementation, but it is important to also understand the layer 2 implementation.

To deliver to an end host, an IP Multicast address must be converted into an Ethernet address because Ethernet addresses are required for link layer delivery. Ethernet has its own rules for handling Multicast frames, as it can be used independently of IPv4 Multicast[viii].

An Ethernet address consists of 6 bytes, split into two three-octet identifiers. The first three bytes indicate the Organizationally Unique Identifier (OUI), which is generally that of the manufacturer of the network card. The second three bytes indicate the Network Interface Controller (NIC) Specific[ix]. Any packet with an OUI of 01-00-5E is considered to be a Multicast address, meaning that Ethernet Multicast addresses fall into the range 01-00-5E-00-00-00 to 01-00-5E-7F-FF-FF.

Generally, a switch will only forward frames on an interface where it knows the destination Ethernet address resides. In contrast, a Multicast destination address in an Ethernet frame does not map to a known host in the network, so the switch needs to treat a Multicast frame differently than a standard Ethernet frame. A switch should forward all Multicast addressed frames on all interfaces because the Multicast address OUI starts with 01, which signifies the frame is a Broadcast frame.

By making use of the Broadcast octet, Multicast can be supported on both vanilla and feature-rich switches. A dumb bridge might just forward all packets on all interfaces; a standard off-the-shelf switch with no knowledge of Multicast will forward the Multicast packets on all interfaces due to the presence of the Broadcast octet. An intelligent switch, however, will maintain its own Multicast group mappings to use for forwarding.

By maintaining a mapping of Multicast addresses and only forwarding Multicast frames on links where group members are present, an intelligent switch can avoid needlessly congesting links in the layer 2 network with Multicast frames. Intelligent switches are therefore highly valued in datacenter Multicast deployments.

Intelligent switches can snoop IGMP messages as they pass through. They maintain their own mapping of interfaces to groups (in some cases MAC addresses to groups), and send IGMP reports upstream. By performing this IGMP snooping, switches avoid the congestion caused by many IGMP reports being sent from hosts to the router (this has partly been addressed in IGMPv3 which supports multiple groups per report).

A sharp observer will notice that Ethernet is a 48-bit address space, while IPv4 is a 32-bit address space. What does this mean for IPv4 to Ethernet address conversion? The first four bits of the IPv4 address are fixed for class-D addresses, thus we are left with 28 bits. We use the lower 23 bits of the IPv4 address to convert into the lower 23 bits of the Ethernet address (the first bit of the NIC specific bytes of the Ethernet address must be 0, hence 23 bits rather than 24). This leaves five (28 bits – 23 bits) unused  bits in the IPv4 address, meaning each Multicast Ethernet address can represent 32 IPv4 Multicast groups. On receipt of a Multicast frame, the host will analyze the address in the IP header (if present) and filter the unintended messages received for groups for which it is not a member[x].

Multicast in Action (code example)

Generally, best-effort protocols such as UDP and RTP are used with Multicast, as re-transmission causes duplicate packet receipt for the other recipients. All hosts must share the ability to decrypt and encrypt traffic, meaning many applications that wish to use Multicast have to generate a shared key for all recipients.

Multicast is not supported on the internet due to security concerns, largely stemming from fear of Denial of Service attacks[xi]. Multicast is generally supported out-of-the-box on off-the-shelf routers like the ones found in the average home. This code example demonstrates rudimentary Multicast communication between nodes within a network.

To compile and run, execute the following command on two or more machines on your LAN:

javac Node.java && java Node;

You should see messages being sent and received by each node using Multicast!

For the full code listing, see the following Multicast gist.

Summary

Multicast allows us to distribute data to multiple recipients simultaneously. Hosts join a group, and from that point onwards they are sent copies of all messages destined for that group. Any host on the network can attempt to join one of these groups by sending a special IGMP message to the network’s router to receive that group’s messages.

There may be efficiency benefits to using Multicast in environments where large numbers of sensors or sensor-enabled devices communicate, such as Ubiquitous (and Personal Area) Networks (UPANs). Imagine, for example, a controller device Multicasting information to a set of sensor devices worn around the body.

In the short term, Big Data and Cloud environments, categorized by large numbers of co-located servers housed in data-centers, can take advantage of Multicast communication. This need for timely data analysis by large numbers of machines is currently driving the resurgence of Multicast.

If you are interested in the specifics of Multicast please see the references below. For a more detailed overview, however, have a look at the following links:

 


[viii] There are two types of Multicast: layer 3 and layer 2 Multicast. Layer 2 Multicast can be used independently of layer 3 Multicast. Both implementations use IGMP, necessitating the conversion from IP address to MAC address. Layer 3 Multicast has too large a scope to discuss here.

Endorsement Spamming

Recently LinkedIn added a feature that allows you to endorse people for certain skills. Simply put, this is the ability to ‘plus 1’ a skill for any connection.

Most would agree that the ability to gauge someone’s perceived expertise at a glance is useful.  My problem with endorsements is that the lack of restrictions on handing them out undermines the value of the process. For example, I have one colleague, who shall remain anonymous, who simply endorses everyone for everything (more on this later). The consequence of this approach is that people are happy about receiving endorsements and then feel obliged to endorse him/her back.  My colleague is now a perceived expert in an astonishing number of areas!

Perceived Expert
This guy is good!

Problem 1 – Reciprocal Endorsements

There is no cap on the amount of endorsements one person can give. With the general encouragement LinkedIn gives to ‘pass endorsements forward,’ we arrive at a situation where it is beneficial to endorse everyone for everything in hopes the favor will be returned. Since there is no way of viewing how many endorsements any particular user has given, the theory suggests a heavy endorser will likely be made to look far more impressive than they actually are – the most effective strategy is to endorse indiscriminately. As this trend continues, we tend toward a scenario where honest endorsements are marginalized.

Should you 'pay it forward?'
Should you ‘pay it forward?’

The obvious preventative solution is to limit the number of endorsements a user can give. This would limit the range of the tool, but increase the value of each endorsement, while still using a metric that is easy to understand.

If they didn’t limit endorsements, LinkedIn could display how many recommendations a user has given next to how many that user has received. A user viewing a profile can then gauge how meaningful someone’s received endorsements are. The downside to this approach is that the metric cannot pin-point reciprocal endorsements and thus leaves a large margin of error for different types of users. Interestingly enough, LinkedIn already does this for the written recommendations; although as the amount of these is far fewer, the metric is easier to assess for users.

Personally, I think the best solution would be a dynamic cap. One endorsement per connection added on your profile, with no restrictions on their use. Make people consider what their use case is and how to disperse them in a meaningful way.

Problem 2 – Suggested Skill Box

In a bid to encourage endorsements, LinkedIn displays an endorsement box on all profile pages. It proposes four suggestions:

LinkedIn Endorsements
Who do you want to endorse?

First, the skills generated in the skill box are not always the skills specified on the relevant user’s profile. The LinkedIn algorithm often suggests skills that are completely irrelevant. Overnight you can become a perceived expert in something you have no experience in at all. My personal favorites this week were Bash and Python, for which I received 3 endorsements each. I have never, in my life, held a developer/engineer position or written a line of Python!

Second, these questions are far too suggestive. Being offered a name and an associated skill is simply leading the user down a path, and as discussed above, this is often the wrong path. “Who would you recommend for Java?” would be a far better question. The user should at least have to think about the skill set and their connections.  Even better, how about limiting the suggested endorsements to those that you have noted as your own skills? Maybe this wouldn’t be enforced if the user were to go to a connection’s page to endorse, as this requires more effort than a skill box endorsement.

Finally, the interface for the endorsement box just encourages inaccuracies. The most prominent (and highlighted button) is, of course, ‘Endorse all 4’. If you close one recommended endorsement, another appears. If you endorse one connection, another will appear.  If you ‘endorse all 4’, you can do four more! You can endorse 100 connections in your network for a random skill in less than 30 seconds. Now that’s efficiency. I have been told by my anonymous source that if you continue to ‘endorse all 4’ the suggestions do eventually dry up…

Wrap Up

I’ve had quite a lot of fun on LinkedIn this week, adding new connections and endorsing people I know well, a little, and although I must confess to being part of the system I’m complaining about, really not at all.  I’ve had a variety of responses, from thank you emails and return endorsements, to the dreaded no response (disclaimer – I have since deleted all the extra skills that were created and the related endorsements!).

As attractive as it is to have endorsements coming out your ears, with the current approach LinkedIn is taking to try and encourage its use, it is only a matter of time before people will disregard what could be an incredibly useful feature. You can see the inevitable happening already – try it.