Metaphysical Developer

You can Trust your Data to the Cloud (In theory)

Posted in Systems by Daniel Ribeiro on November 30, 2009

Any sufficiently advanced technology is indistinguishable from magic

Arthur C. Clarke

Cloud computing services such as Amazon EC2, Salesforce, Google App Engine and Heroku promise several advantages to the traditional datacenter model: economies of scale, reduced costs, simplification of IT infrastructure, scalability, reliability, the ability to pay-as-you-go. Nevertheless, there are some concerns regarding trust, transparency, vendor lock-in and security.

One of the the most glaring controversies around cloud computing is the data privacy issue: your cloud provider has total access to all your data. While it can be argued that the trust issue is not that much more than deploying your own servers to a datacenter, it would be desirable if this was a non issue. Standard encryption schemes are generally useless to solve this issue: if data is encrypted/decrypted inside the cloud’s infrastructure, they will have access to the keys (at least as in memory data), not only the decrypted data visualized. Therefore, traditional schemes allow you only to safely retrieve and store encrypted data, which usually defies the purpose of using a cloud provider (mostly with huge amounts of data).

It was an open problem whether it would be even possible to compute on the unencrypted data without decrypting it, until Craig Gentry, currently working on IBM‘s Cryptography Research Group, recently proposed as part of his Ph.D thesis, a scheme (with the unique property of being fully homomorphic) that enables it. Whith this scheme, any function can be evaluated on the decrypted text without decrypting, while keeping the result encrypted in itself. The function can itself be made secret, therefore allowing the scheme to provide total privacy on input, output and computation.

The end result is: on any cloud setting where you can compute arbitrary functions, you can implement the proposed fully homomorphic scheme, and trust your data to your cloud provider without fear of giving up on privacy, without having to solely rely on Service Level Agreements.

The remaining issue is: the proposed scheme is efficient only under theoretical standards. Which means it runs on polynomial time on the security parameter. The problem is that the polynomial is of degree higher than six, which means it is not really practical to use it. For instance, Gentry estimates that if Google used it in its search, the results would take 1 trillion times more to be given. Therefore, the problem has been solved in theory, while still remaining unpractical. On the bright side, optimizing a possible problem tends to be easier than chasing an open one, or trying to prove it can’t be solved.