Any sufficiently advanced technology is indistinguishable from magic

- Arthur C. Clarke

Cloud computing services such as Amazon EC2, Salesforce, Google App Engine and Heroku promise several advantages to the traditional datacenter model: economies of scale, reduced costs, simplification of IT infrastructure, scalability, reliability, the ability to pay-as-you-go. Nevertheless, there are some concerns regarding trust, transparency, vendor lock-in and security.

One of the the most glaring controversies around cloud computing is the data privacy issue: your cloud provider has total access to all your data. While it can be argued that the trust issue is not that much more than deploying your own servers to a datacenter, it would be desirable if this was a non issue. Standard encryption schemes are generally useless to solve this issue: if data is encrypted/decrypted inside the cloud’s infrastructure, they will have access to the keys (at least as in memory data), not only the decrypted data visualized. Therefore, traditional schemes allow you only to safely retrieve and store encrypted data, which usually defies the purpose of using a cloud provider (mostly with huge amounts of data).

It was an open problem whether it would be even possible to compute on the unencrypted data without decrypting it, until Craig Gentry, currently working on IBM’s Cryptography Research Group, recently proposed as part of his Ph.D thesis, a scheme (with the unique property of being fully homomorphic) that enables it. Whith this scheme, any function can be evaluated on the decrypted text without decrypting, while keeping the result encrypted in itself. The function can itself be made secret, therefore allowing the scheme to provide total privacy on input, output and computation.

The end result is: on any cloud setting where you can compute arbitrary functions, you can implement the proposed fully homomorphic scheme, and trust your data to your cloud provider without fear of giving up on privacy, without having to solely rely on Service Level Agreements.

The remaining issue is: the proposed scheme is efficient only under theoretical standards. Which means it runs on polynomial time on the security parameter. The problem is that the polynomial is of degree higher than six, which means it is not really practical to use it. For instance, Gentry estimates that if Google used it in its search, the results would take 1 trillion times more to be given. Therefore, the problem has been solved in theory, while still remaining unpractical. On the bright side, optimizing a possible problem tends to be easier than chasing an open one, or trying to prove it can’t be solved.

Using pre-made solutions as parts of solving a bigger problem is a common motif on several areas of human knowledge. At first, it makes perfect sense not to waste valuable time making something that was already done. Encompassing this ideal, a lot has been said about doing this with software, what is commonly referred as code reuse.

There are three canonical types of reusing code, as listed by Erich Gamma and Ralph Johnson: software libraries, Design patterns and Frameworks. There are other kinds of code reuse such as software platform (similar to both a framework and collection of libraries ), services or even using a complete application and building on top of it. Each and every one of these types of code reuse differ in its learning curve, in its flexibility (in the sense of how many different new applications, libraries, platforms or frameworks can be built from it) and the rate of functionality/design embedded in it.

But things are not so simple as putting Lego bricks together. Whenever looking for reusable piece of software, how to pick the most suitable one? Even if there is only one piece of one kind, you still have to figure out if it is suitable, or how much work it will be needed to make it suitable (which is the most common case).

This is because the piece can be too complex, not well documented, not fulfill all the desired properties, not be robust enough, without binds to the language/platform you are using, not have a permissive enough license, not be robust enough or make trade-offs that are not really what you would expect (such as choosing CP instead of CA on CAP), not be supported by a big company, and so on. And this gets a lot more troublesome when you have lots of piece of software to choose from. Since evaluating these things take time, a more agile approach is to make the trade-off of how much time to spend evaluating such pieces in order to gain information on them, in order to reduce the risk of making an inappropriate decision (always weighted by how bad can it be to make one, somewhat similar to swot). Even then, on your core domain, reusing big pieces of code is hardly advisable.

However not all software is reusable. At least not easily reusable. The reasons lies on the difficulties of having a reusable design (such as those stated by Uncle Bob Martin his SOLID series, or by Neil Bartlett on his Component Models Tutorial) and the fact that making it easy for others to use requires a bit more support (such as documentation, tutorials and/or screencasts). Not to mention that most business are not about launching frameworks or application libraries, and therefore, are not interested in this work (which can be good or bad, depending on the cost of making it reusable, and the money made/saved of doing it). Which ensues that a lot of software is not reusable. All of this could suggest that making reusable bits of software requires a Big Design Up Front. However, agile practitioners suggest making code reusable as it is actually needed (a last responsible moment approach). This has actually worked well in practice, as the framework Ruby on Rails was extracted from other projects from 37 Signals, and Apache Hadoop was extracted from another project: Apache Nutch.

Code reuse is not a simple matter, and should not be taken lightly. It involves decisions that can be more relevant from a economical/business perspective than from a purely technical perspective. In the end, the real failure of reuse is the failure to realize this.

At least two trends are making paralell and distributed programming come to focus: computers with multiple cores getting cheaper and getting more cores, and websites leveraging terabytes worth of of user content. There are several services, tools and programming models around to help people cope with such trends, such as Amazon EC2, Hadoop, Fork Join, Actor Models, Non relational Databases, and so on. A couple of things to bear in mind when using multiple cores or multiple computers (with or without these tools):

  • Amdahl’s law: It states that, with a fixed problem size, there is a point for every program after which adding more computational units do not give you improved performance. By computational units, you can take either cores (for paralell programming) or hosts (for distributed programming). Gustafson’s law tackles the issue when you do not fix the problem size, but this does not really help when you need faster response for your current problem size.
  • Brewer’s CAP Theorem: This has to do with distributed programming only, and many people said a lot of about this. But essentially it means that a distributed system can only have at most two out of the three following properties: consistency, availability and partition tolerance (a kind of fault tolerance).

Amdahl’s law is quite troublesome: you cannot really do anything about it, but changing the algorithms involved. But CAP Theorem allows you to trade off one property for another. You can relax consistency into eventual consistency, you can relax on fault tolerance against partition tolerance, or you may live with less nines of up-time. It all depends on your application’s profile which one you will have to abandon. Dealing with this means no longer looking for ACID (Atomicity, Consistency, Isolation, Durability) systems, where consistency is very important, but looking for BASE (Basic available, soft-state or scalable, and eventually consistent) systems. It means listening to Gregor Hohpe’s suggestion and accepting that Two Phase Commit may not be the right way to go in detriment to pursuing the new ACID properties (Associative, Commutative, Idempotent and Distributed).

Not every system requires looking into these trends and thinking about such limitations and trade-offs. But if yours does, then keeping these in mind might come in handy.

It has been a while since Java was the sole language running over a JVM. Scala is another such language which gained a lot attention recently for being used to scale Twitter’s backend. Scala differs from most other languages that run on the JVM, such as Groovy, JRuby and Jython, as it is statically typed. This means that, similar to Java and C#, the types must be known at compile time. Scala is usually introduced as being both OO and functional. While this statement is true (and daunting, as many people are uncomfortable with the f*** word), it fails to grasp the important aspects of Scala.

Among the most direct benefits of using Scala feature:

  • Compatible with Java. Kinda obvious (as so are all the other 200+ languages over the JVM), but it is such an important feature that should not be overlooked. This means that Scala can use all Java libraries and frameworks. Which shows respect for people’s and companies investment on the technology.
  • Joint Compilation. This means that, like Groovy, Scala classes are compiled to Java classes, and therefore can be used on Java projects (even by java classes on the same project they are defined). Even if your team decides to make the complete move towards Scala, this can be useful integrating with dynamic languages via JSR 223.
  • Type Inference. If the compiler can guess the type (and it usually can), you don’t have to tell it. This allows Scala code to be as concise as dynamic languages, while still being type safe.
  • Implicit conversion allows you to achieve in a type safe way what extension methods do for C# and open classes (mostly) do for ruby. That is: add methods to types you might not have not defined yourself (such as strings, lists, integers). This is one of the features that make Scala DSL friendly.
  • Object immutability is encouraged and easy to accomplish. Scala even comes with immutable collections built-in.
  • Getters and Setters are automatically generated for you. If you don’t want them (if you only want setters for example), you have to explicitly make them private. Which is not a problem, as the common case is to want them.
  • Scala has first-order functions and implements an enumeration protocol (with the iterable trait), which helps keeping code clearer, more concise, and brings several other benefits.
  • The Actor programming model eases up the development of highly concurrent applications.
  • Exceptions don’t have to be explictly caught or thrown. It can be argued that having checked exceptions does more harm than good.

These features alone would be enough to make Scala a very interesting language, and worth being heralded as the current heir apparent to the Java throne by one of JRuby’s creator, Charles Nutter (a view somewhat shared by Neal Gafter). Or even worth of being endorsed both by Groovy’s creator, James Strachan, and by the inventor of Java, James Gosling.  Nonetheless Scala is deep, and there are several exciting advanced features that allow developers to be more productive. But learning such features before getting a good grasp the basics can be quite frustrating, more so without a good supporting literature (such as IBM’s, Aritma’s, Jonas Bonér’s, Daniel Spiewak’s, Sven Efftinge’s, the official one, and several others). However it quite is feasible, not only encouraged, to delve into deeper concepts as you need them.

Even though Scala has academic roots (as it shows on its papers page, and some advanced concepts these tackle), Scala has been successfully used on enterprise projects, besides Twitter, such as Siemens, Électricité de France Trading and WattzOn.

Besides all the good points, Scala does have some rough edges. Even though many people are working on overcoming them, they are likely to be relevant on the short term:

  • Incipient IDE support. As Lift’s author expressed, IDEs for Scala, while undergoing a lot of development, are not what they are for Java. There is poor refactoring support, code completion and unit test integration. Not to mention the fact that most framework support tools will not play nicely with Scala. This can also put off some newcomers, as an IDE can help people learn the language. On the other hand, Martin Folwer relativizes this IDE situation, as a language that allows you to be more productive can more than make up for the lack of sophisticated tools.
  • Joint Compilation is not supported by most IDEs as well. Again, likely to change as Scala grows in popularity.
  • Immutability on a class is not really immutability, since referring objects may not be immutable themselves. And there is no way at the moment to ensure the whole object graph is immutable.
  • Making JSR 223 work perfectly with Scala can be challenging. On the other hand, making it work good enough is quite attainable.
  • Scala doesn’t support metaprogramming. This can be worked around by combining it with dynamic languages, such as Ruby (following a polyglot programming approach), but if you are going to do heavy use of metaprogramming, than using a whole different language may be a better solution (Fan is another static type language that runs over the JVM, similar to Scala, that has metaprogramming support).
  • Frameworks that expect Java source, such as the client-side GWT, will not play nicely with Scala (note that people have made Scala work with GWT on the server-side though). However there is an ongoing project that will translate Scala into Java source.
  • The syntax and some concepts are bit different from Java, such as: inverted type declaration order, underscore being used instead of wildcards, asterisks and default values, many kinds of nothing, no static methods (you need to use singleton objects instead) and other minor things. The documentation walks through this quite nicely though, but keep in mind that it is not an automatic transition from writing Java to writing Scala code.

As Joe Armstrong said, the need for languages that allow developers to easily make use of CPUs with multiple cores will only increase as such CPUs become cheaper and gain more and more cores. Scala is quite suited for such task, while Java’s development is stuck dealing with issues that come from being widely deployed, uncertanties of how open it will be in the future and political issues with some of its main contributors. Given the situation, Scala seems to fit quite nicely the role of the successor to Java’s throne.

Pair Programming is a technique where two developers work on the same piece of code at the same time. Popularized by Extreme Programming, it had been applied long before.

One team that applies it consistently can achieve several benefits, such as:

  • Reducing the risk of Overengineering: one person is more likely to over engineer solutions when left alone, than when having a partner to keep you grounded on the problem at hand.
  • Reducing bugs: a partner reading the code is more likely to catch bugs than a the person coding, who is usually focused on the current line of code, and not on the overall context.
  • Improving Quality of Code: refactorings are easier to accomplish safely when one person is not focused on actually performing the refactoring and therefore can focus on ensuring the step is safe and checking for further refactoring opportunities. Also, the methods, modules, class names, functions and variables names are readable by at least two people’s standards.
  • Knowledge Sharing: this is one of the biggest benefits of Pair Programming because when developers are pairing, they share several kinds of knowledge:
    • Business Knowledge: even if not doing DDD, the developers communicate about the problem on a business level, what concepts are important, the business rules, and so on.
    • Technical Knowledge: such as frameworks, object libraries, platforms, ORM frameworks, databases, programming languages and tools that are being used, or could be used, on the project.
    • Theory Knowledge: a software project is not just solving business rules. From time to time, more abstract concepts are needed to solve a problem, and developers can teach each other about concepts such as concurrent programming, aspect oriented programming, design patterns, architectural patterns, graph theory, boolean algebra, security and cryptography, regular expressions and formal languages, problem complexity, software design
    • Project Knowledge: how the project works, how it is deployed, how to access the subversion (or any other SCM), how and which methodologies are applied, how bugs are tracked, and so on. It also includes the knowledge of the history of project decisions such as: why was this library used instead of that other one, why this algorithm was selected, why this architecture is being used, which other frameworks were evaluated…

    In order to allow the knowledge to be shared among all participants, it is vital that pairs are constantly changed within a team. How often if very dependent on the size of the team, the features being tackled, and the team’s willingness to change it more or less often. All these types of knowledge sharing brings some other nice advantages: it gives the project less risk, as no single part of the code can only be changed by only one person (specially useful when this one person would be vacation, sick in a hospital, or leaves the project). Also, gives more confidence for the team to employ Collective Code Ownership, it allows the team to be more productive over time, as less of the basics is needed to be explained and more pairs can tackle a issue where only a specialist would be able to otherwise, and finally the team as a whole can estimate better the size of features to be implemented (let it be story cards, tasks, or old-fashioned use cases), as everyone is more aware of the code base and the difficulty of problems to be tacked.

Other benefits of pair programming (an in-depth analysis can be found here paper) include: improved morale, fewer interruptions, more team focus and higher productivity. It is also important to note that a team cannot pair program all day, as people usually have other tasks to attend, such as meeting’s, training, reading and responding emails, drink coffee.

This technique can be hard to actually apply, as many developers are not used to pairing, managers are afraid of it costing more (two people working on the same machine can never be more productive) and not everyone work well with such constant peer review and collaboration. Agile Methodologies value individuals and interactions over processes and tools, and this is quite critical when employing Pair Programming, because these difficulties (which may exist in varying degrees) can only be surpassed as long as the team (including managers) is willing to overcome it. Implementing it incrementally (maybe only a few hours a day, and building it up and the team sees the benefits) can ease the resistance, and the team can get experience with it while incrementally discovering impediments of pairing more often. Also, applying Coding Standards first can diminish the occurrences of pairs fighting about minor issues such as whether to use underscore or camel case. Also, if a team struggles in the beginning, mostly managing who codes and who observes, the variant Ping Pong Pair Programming can be easier to begin with. However, it implies Test Driven Development, which can be tricky to begin if the team doesn’t do it already.

Pair programming is a great technique, one that is usually overlooked even by agile teams (as Rachel Davies was able to see on some projects she coached). But with so many potential benefits, it is always worth giving it a shot.