Insights on Parallel and Distributed Systems
At least two trends are making paralell and distributed programming come to focus: computers with multiple cores getting cheaper and getting more cores, and websites leveraging terabytes worth of of user content. There are several services, tools and programming models around to help people cope with such trends, such as Amazon EC2, Hadoop, Fork Join, Actor Models, Non relational Databases, and so on. A couple of things to bear in mind when using multiple cores or multiple computers (with or without these tools):
- Amdahl’s law: It states that, with a fixed problem size, there is a point for every program after which adding more computational units do not give you improved performance. By computational units, you can take either cores (for paralell programming) or hosts (for distributed programming). Gustafson’s law tackles the issue when you do not fix the problem size, but this does not really help when you need faster response for your current problem size.
- Brewer’s CAP Theorem: This has to do with distributed programming only, and many people said a lot of about this. But essentially it means that a distributed system can only have at most two out of the three following properties: consistency, availability and partition tolerance (a kind of fault tolerance).
Amdahl’s law is quite troublesome: you cannot really do anything about it, but changing the algorithms involved. But CAP Theorem allows you to trade off one property for another. You can relax consistency into eventual consistency, you can relax on fault tolerance against partition tolerance, or you may live with less nines of up-time. It all depends on your application’s profile which one you will have to abandon. Dealing with this means no longer looking for ACID (Atomicity, Consistency, Isolation, Durability) systems, where consistency is very important, but looking for BASE (Basic available, soft-state or scalable, and eventually consistent) systems. It means listening to Gregor Hohpe’s suggestion and accepting that Two Phase Commit may not be the right way to go in detriment to pursuing the new ACID properties (Associative, Commutative, Idempotent and Distributed).
Not every system requires looking into these trends and thinking about such limitations and trade-offs. But if yours does, then keeping these in mind might come in handy.