Cloud Security: Plugging the Hole in Data Leakage

This is part of a series of blog posts on cloud security by Carlos Cardenas, our Director of Solutions Engineering. Carlos is a security expert who came to Joyent from The Institute for Cyber Security ICS at the University of Texas, San Antonio. While at ICS, Carlos worked under Ravi Sandhu, PhD, one of the leading security experts in the world.

This post features guest co-author, Dan Bogdanov, PhD from Cybernetica whose expertise lies in securely sharing and analyzing confidential data. He has designed and implemented an application server called Sharemind that collects and processes data while it is still encrypted. Dan performs the editor duties of the ISO/IEC 29101 standard on a Privacy Reference Architecture and was recently named Outstanding Young Person in 2013 by JCI.

In this post, we are going to discuss the current challenges of sharing private data for analysis while still being able to use the data in a reasonable manner.

Background

Let's take a step back and describe some scenarios that illustrate why this problem is challenging and necessary to solve.

Say there’s a set of information that contains plenty of Personal Identifiable Information (PII) but needs to be queryable as the data as a whole drives decisions and has a wide range of implications such as Census Data and Tax Returns. Some important questions like “What’s the average salary of a software engineer?’ or “What’s the tax burden for a family of 4 if law X is passed?”

However, having a dataset like this, regardless of how beneficial it may be, allows for people to identify others along with their confidential data. These problems are known as information leakage and to some extent, Kleptography. Information Leakage occurs when a closed system (think blackbox) leaks information to those that are unauthorized or who do not possess the need to know. Here’s a classic example:

There’s a C130 that is heading to A. This plane has total capacity C, which is classified Top Secret. It has the capability to carry X UnClassified cargo, Y Secret Cargo, and Z Top Secret cargo. Let’s say the plane has reached most of C, when a UnClassified operator, O, is asked if this next item Item, is able to be on the plane. O does not know C, Y, Z, C{y,z} (capacity of Y and Z) or cur{Y,Z} (current capacity of Y and Z). O only knows X and Cx (capacity of X, unclassified cargo and it’s current value, curX), so O must determine if Item + curX is still <= Cx and if it is, Item can be loaded and Cy and Cz must be decremented as that space has been taken and curC is incremented.

So what happens if a Top Secret operator, Ots, adds a top secret item? Same procedures as above but Cx, Cy and curC are altered. However, if O was observing the system when Ots was adding that item, O now knows there’s an item that is either Secret or Top Secret on board! These three compartments need to operate in the standard “no read up, no write down” security model Bell-LaPadula. A simple way to do that is have C{x,y,z} to be non-mutable by other levels but that is suboptimal since the plane will be underutilized most of the time.

Current Challenges

There is a field called Homomorphic Encryption, or HE for short, that allows a specified type of actions to be performed over the ciphertext, that when the result is decrypted, will match the same operation as if it was done in the clear.

What Does This Mean?

Imagine that YOU want to query a dataset owned by JOYENT. For example, how many PRODUCTS were bought by enterprise customers in the last year to assess your market share of PRODUCTS.

The process would be:

  • YOU encrypt the search terms and upload them.
  • JOYENT decrypts the search terms so that they know what to look for.
  • JOYENT decrypts the dataset so they have somewhere to search.
  • JOYENT performs the search using the decrypted dataset.
  • JOYENT encrypts the search results, if there are any, and returns them to YOU.

Additionally, one hopes that:

  • JOYENT cryptographically destroys all traces of the dataset, including the search terms and the decrypted dataset once the search is complete.
  • YOU are not abusing your access of JOYENT’s dataset.

Now imagine if JOYENT’s dataset stays encrypted and YOU take the encrypted search terms and query the dataset directly (still encrypted) to get the same results. No need to decrypt anything, not to mention no longer having to rely on YOU to lose, steal, or sell JOYENT’s dataset. This is what HE and similar technologies promise.

However, there’s one catch: full HE (fully homomorphic encryption - can perform more secure operations) is computationally intensive, so intensive in fact it’s borderline impractical for anything worthwhile. A few years ago, IBM stepped up and help with this problem by releasing their HELib software which optimizes the computational complexity. Even though it is now functional and able to run today, the computational power needed to run in a decent size dataset is still impractical.

There’s also property-preserving encryption and systems like CryptDB from MIT. These systems are reasonably fast, but also limited in what kind of analyses they can perform. For example, processing fractional numbers (e.g. sums of money, measurements) is non-trivial.

There’s got to be a better way.

How to Overcome These Challenges

Of all people, DARPA, the research agency of the Department of Defense, (DOD), is tackling the problem. Their PROCEED program is pushing the state of the art on what kind of analyses can be performed on protected data.

One of the teams participating in PROCEED comes from an Estonian company called Cybernetica. You may know Estonia as the host of the NATO Cooperative Cyber Defense Centre of Excellence and a country with a twitter-happy president.

Cybernetica’s Sharemind is an application server technology based on homomorphic secret sharing. Secret sharing protects data values just like encryption, but with a smaller computational overhead. On the other hand, you’ll need several servers to store the data.

Even with the need for several servers, Sharemind can do pretty impressive stuff. They recently showed an application prototype that can estimate whether two orbiting satellites will collide with each other. The coolest thing was that they were doing secure operations on floating point numbers - something that the competition is still struggling with.

How Does This Relate to Cloud Computing

Even though the work is partially funded by the DoD, the civilian applications are much easier to find. For example, imagine a cloud-based data sharing service that lets you combine your confidential data with other data without disclosing your secrets to others. This could have profound implications to the whole open data movement.

Or - let’s hit one idea closer to home. In the cloud business, there are several companies providing customers with monitoring and reports. Do you want to know how the clouds' offerings size up against each other? You can ask a company like Cloud Spectator. But how do they know, you ask? Well, they ask us, of course. And we’re not always willing to answer, because we don’t know if they will take our metrics directly to the competition.

Something like Sharemind could really work here. The cloud service providers share their data with a monitoring company so that the latter cannot see the raw values, but can still present aggregated reports to its customers. If the plan is executed and explained well enough, the cloud service providers (CSPs) may (I’m making no promises here) be willing to provide more information, leading to a better cloud ecosystem. And that’s something we’re working on here at Joyent.



Post written by Carlos Cardenas & Dan Bogdanov, PhD