Metaphysical Developer

High Level Concurrency with JRuby and Akka Actors

Posted in Languages, Systems by Daniel Ribeiro on December 16, 2010

Many developers are used to low-level concurrency primitives, such as locks, monitors and semaphores. Java also has higher level concurrency utilities such as Atomic Objects and Fork/Join framework. Such primitives still require a lot of attention to shared variables, and are very easy to get wrong. Ilya Grigorik recently discussed other models of concurrency on his recent post Concurrency with Actors, Goroutines & Ruby, where he even introduced a ruby port of Go‘s concurrency mechanism. These models attempt to make concurrent programming easier.

The actor model is another very simple high level concurrency model: actors can’t respond to more than one message at a time (messages are queued into mailboxes) and can only communicate by sending messages, not sharing variables. As long as the messages are immutable data structures (which is always true in Erlang, but has to be a convention in languages without means of ensuring this property), everything is thread-safe, without need for any other mechanism. This is very similar to request cycle found in web development MVC frameworks.

Scala is famous for coming with an Actor library built-in. However, using Scala libraries in Ruby is not easy[1]. Akka is another great project that implements Actors, however it has a java Api, which makes the JRuby integration easier. Why JRuby? Not only to access Akka’s actor library, but also because JRuby is one of the few ruby implementations that doesn’t have the GIL, therefore it allows true concurrency for all types of applications (IO bounded or not).

Integrating Akka with JRuby

For starters, let’s first create a simple actor in Java:

public class PingActor extends UntypedActor {
	public void onReceive(Object message) throws Exception {
	    if (message instanceof String) {
	    	System.out.println("!!! Acted on: " + message);
	    }
	    else throw new IllegalArgumentException("Unknown message:" + message);
	}
}

This simple actor will just output any message it receives prefixed with “!!! Acted on: “, and will throw exception on any message that is not a string. This example show how simple it is to define an actor: just define a onReceive method that is called whenever a message is sent.

To see this actor working, we need four lines:

		ActorRef actor = actorOf(PingActor.class).start();
		actor.sendOneWay("hello actor world");
		TimeUnit.SECONDS.sleep(1);
		ActorRegistry.shutdownAll();

The first gets an actor reference and starts it. It is important to note that you cannot create an actor just by invoking new. Not in java or scala (we can solve this in Ruby). This is because there is a lot of AOP going on the background[2]. The second line just sends the message to the actor asynchronously. There two other ways of sending messages, which I’ll not cover, but you can read more in Akka’s documentation.

The last two lines just give time to the message reach the actor (remember, the sendOneWay method is non-blocking), and stops all actors on the system. Pretty simple right? Let’s see how we can do the same in JRuby. Setting up the stage:

require 'java'
module Akka
  include_package 'se.scalablesolutions.akka.actor'
end

These lines enable java and make a ruby module with all the classes of se.scalablesolutions.akka.actor package. Basic JRuby setup. Now on to defining the actor:

class PingActor < Akka::UntypedActor
  def self.create(*args)
    self.new(*args)
  end

  def onReceive(message)
    puts "!!! Acted on: #{message}"
  end
end

Here we have our first differences. The onReceive is just a cleaner version of the Java one. No type annotations, no type checking and a simpler string output. However, we have to define a classmethod called create, which just invokes new. This method seems to be created by the AOP part of Akka, which doesn’t seem to work on Ruby subclasses of UntypedActor. However, we can defined it ourselves. Now to actually using the actor:

actor = Akka::UntypedActor.actorOf(PingActor).start
actor.sendOneWay "hello actor world"
sleep 1
Akka::ActorRegistry.shutdownAll

Pretty much the same four lines as on the Java version, with a little less parenthesis, and a terser sleep method. The ruby code can be found on this page, and Java code here.

Fixing the Ruby Interface

Much of the code in the former example is infrastructure, but we can work around the static nature of Java classes in ruby. As factored out in akka.rb, we can gather this functionalities into a base class, and rewrite the first example in 8 lines:

require 'akka'
class PingActor < Actors::Base
  def onReceive(message)
    puts "!!! Acted on: #{message}"
  end
end
PingActor.spawn.sendOneWay "hello actor world"
Actors.delayedShutdown 1

Using closures we can even enhance our JRuby api with some ideas from the Scala api, making it down to 3 lines:

require 'akka'
Actors.spawn { |m| puts "!!! Acted on: #{m}" }.sendOneWay "hello actor world"
Actors.delayedShutdown 1

In this example spawn takes a block, and creates an actor that executes it every time it receives a message. As in the Scala API, spawn starts the actor as well as creating it.

But every Object is an Actor!

Alan Kay, the inventor of Smalltak and of the term OO, once said:

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging”.

This is one of the reasons that Erlang with its actors form an object oriented language[3]

If we look into the resemblance of sendOneWay and the reflective method invocation, which in Ruby is made through send or __send__, it is quite easy to adapt the ruby method invocation to Actor message sending. We start with a simple delegator:

MethodParameters = Struct.new :name, :args, :block

class DelegatorActor < Base
    def self.new(target)
      ret = super()
      ret.instance_variable_set(:@target, target)
      return ret
    end

    def onReceive(message)
      param = message
      @target.__send__ param.name, *param.args, &param.block
    end
  end

The important part is the onRecieve message, which takes a MethodParameters object and invoke on the target. The caveat here is that we need to override the new method, because, for some reason, ruby subclasses of Akka UntypedActors will not invoke the initialize method with the arguments passed[4]. However, by turning any object into an actor, this can be the only place such hack is needed.

Now, the next step: adapting the actorRefs to make the ruby method invocation a actor sendOneWay:

class ActorRefHandler
    public_instance_methods.each do |m|
      undef_method m unless m =~ /^__/ or m == 'to_s'
    end

    def initialize(actorRef)
      @actorRef = actorRef
    end

    def method_missing(name, *args, &block)
      @actorRef.sendOneWay MethodParameters.new name, args, block
    end
  end

Which is a pretty standard implementation of message forwarding in ruby: remove all instance methods (except the really private ones, such as __send__), making sure all method calls are forwarded to method_missing.

With all of this we can write a simple example of making any object an actor:

require 'akka'
class HelloWord
  def hi
    puts "hello actor world"
  end
end
Actors.actorOf(HelloWord.new).hi
Actors.delayedShutdown 1

Making it faster

In the heart of all of this lies the problem: making code runs faster by using the machine’s cores more effectively. Here we build the good old canonical map-reduce example: word count. We will count 5.4 MB of Shakespeare‘s texts. The example consists of 3 types of actors: a producer, mappers, and one reducer. The producer generates the chunks of lines to the mappers, which count the words on each chunk and generate a hash of word:count pairs, which the reducer aggregates into a hash of its own.

require 'akka'
require 'regular_word_count'
include Actors
module AkkaDispatcher
  include_package 'se.scalablesolutions.akka.dispatch'
  def self.workStealer(name)
    Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher 'mappers'
  end
end

file = File.join(File.dirname(__FILE__), 'shakespeare.txt')
input = IO.readlines(file).each_slice(500).map &:join

This code setups up the code to later define WorkStealer, so that our map actors can share the same message queue. We also load the file in memory, split into 500 lines chunks. If the chunks are too small, the mappers will receive too many messages, which makes the code go slow. If the chunks are too big, the job will not be split evenly among the mappers[5].

start = nil
values = Hash.new 0
linesToRead = input.size
reduceActor = actor do |message|
  linesToRead -= 1
  hash = message
  hash.each do |key, value|
    values[key] += value
  end
  if linesToRead == 0
    puts ">> All over: Just to say we used any computed value: #{values['shakespeare']}"
    finish = Time.now
    puts ">> Total time: #{finish - start}s"
    Akka::ActorRegistry.shutdownAll()
  end
end

The reducer actor is pretty straightforward. When all chunks are read, he shutdowns all actors and outputs the result and the time it took for the whole map-reduce chain to take place.

mapActorsSize = 2
mapActors = []
wordCount = WordCount.new
workStealer = AkkaDispatcher.workStealer 'mappers'
mapActorsSize.times do
  mapActor = actor do |message|
    reduceActor.sendOneWay wordCount.count message
  end
  mapActor.setDispatcher workStealer
  mapActors.push mapActor
end

The mappers delegate the actual work to an immutable WordCount class. The important part is the one that sets the same dispatcher on the actors. More on how this work on Akka’s documentation.

mapActor = mapActors.first
producer = actor do |message|
  for line in input
    mapActor.sendOneWay line
  end
end

allActors = [reduceActor, producer] + mapActors
allActors.each do |a|
  a.start
end
start = Time.now
producer.sendOneWay :start

These lines define the producer, start all actors, set the start time, and send the producer actor a message, which begins the map-reduce chain. It is important to note that the program’s main thread finishes on the last line. This example shows how It is possible to make it wait for the result and then resume the main thread (it requires using the other types of message sending methods, thus I’ll not cover it in detail).

Results: The sequential version runs on my machine (which has 2 cores) in about 4 seconds. This one with map-reduce actors take about 3 seconds, which yields a 25% improvement[6].

Conclusion

This post showed how it is easy to use Akka actors with JRuby and that they can easily enable thread-safe and easy to reason multicore programming. The Akka project has many other tools to help with distributed/parallel programming, such as remote actors, software transactional memory, and integrations with all sorts of persistence/queue systems. This post barely scratches the surface.

All the code on this blog post can be found on github, where all dependencies are easily available, and instructions on how to easily run the code. Give it a try, and see if you agree (or not) with others that writing parallel code can be much easier and fun.

Footnotes

[1] As Daniel Spiewak showed it on his Integrating Scala into JRuby post.

[2] Incidentally Akka was started by Jonas Bonér, who is also one of the creators of the java AOP tool AspectWerkz, which is included in Aspectj nowadays.

[3] From a interview with Joe Armstrong, the creator of Erlang:

Actually it’s a kind of 180 degree turn because I wrote a blog article that said “Why object-oriented programming is silly” or “Why it sucks”. I wrote that years ago and I sort of believed that for years. Then, my thesis supervisor, Seif Haridi, stopped me one day and he said “You’re wrong! Erlang is object oriented!”

[4] In general, you need to create a UntypedActorFactory to pass arguments to the constructor, which we already do (implicitly, using JRuby’s closure to interface coercion), but even then Ruby’s Actors will not work. This could be worked around by changing the Actors module and invoking another hook method that is not initialize.

[5] Thanks to the Akka committers Viktor Klang and Peter Veentjer for the hint.

[6] This is not a real benchmark. Making a real one requires a lot more attention to details like jvm warm-up, jiting from ruby to java, jiting on java bytecode, and so on.

Tagged with: , , , ,

RubyUnderscore: A bit of Arc and Scala in Ruby

Posted in Languages, Software Development by Daniel Ribeiro on October 31, 2010

A few months ago I’ve mentioned one thing that bothered me in ruby was No way to create simple blocks”. This is in contrast to other languages, such as Scala, Clojure and Groovy’s underscore, percent and “it”, respectively, shortcut notations. There are other languages with equivalent mechanisms as well. Even newer languages like Coffeescript have considered adding it. As James Iry mentioned, such constructs are in fact related to delimited continuations.

However ruby has Syntax Tree manipulation (via Parse Tree gem). Using it I created the RubyUnderscore project, which brings this ruby, using the underscore symbol (just like Scala and Arc). With it, it is possible to refactor the following:

    classes.reject { |c| c.subclasses.include?(Enumerable) }
    dates.select { |d| d.greater_than(old_date) }
    collection.map { |x| x.invoke }

into:

    classes.reject _.subclasses.include? Enumerable
    dates.select _.greater_than old_date
    collection.map _.invoke

The last case can also use symbol to proc coercion (appending & to symbol):

     collection.map &:invoke

However, the proc coercion is not flexible enough to allow arguments or invoke a method chain. Which I think brings a small increase in readability and code quality, not to mention that by making closures easier to declare, it fosters them to be used more. I find this to be a good thing, specially when you start to refactor your loops into maps, selects, rejects, group_bys and reduces

This also highlights another issue I mentioned in Improving Ruby: that Syntax Tree manipulation is too important to be supported only on MRI, and not throughout the implementations, like JRuby and Rubinius. Python has this built in into its standard library (through ast module and inspect.getsource), and Lisp can also do syntax tree manipulation with its macro system. The importance of such capability was mentioned by Paul Graham (one of the creators of Arc, which is a dialect of Lisp):

Letting people rewrite your language is a good idea. You, as the language designer, can’t possibly anticipate all the things programmers are going to want to do with it. To the extent they can rewrite the language, you don’t have to.

However, syntax tree manipulation in ruby is not only unsupported in most implementations, but it is also poorly documented (even though PostRank’s founder Ilya Grigorik‘s post on the subject is a very good introduction) and a bit awkward to use: the visitor from sexp-processor gem embraces side effect (mutating all the tree nodes while processing them) and the tree nodes are just arrays, unlike Python’s modules where there is a class for every node type. It is important to note that if you are willing to pre-process your ruby code, you can use ruby2ruby to generate the equivalent and regular ruby code, which will work all over.

These techniques are expected to be fixed as more people realize the gains they bring, and these improvements find their way into YARV and eventually other ruby implementations. Ruby is a very nice, clean, productive and elegant language, and it would be shame if we stopped making it even better.

Tagged with: , , ,

Improving Ruby

Posted in Languages by Daniel Ribeiro on March 31, 2010

Some may say Ruby is a bad rip-off of Lisp or Smalltalk, and I admit that. But it is nicer to ordinary people.

Yukihiro “Matz” Matsumoto, LL2

When I heard the Smalltalk traits of Ruby, I was intrigued. When I learned more, I enjoyed Ruby’s similarities with one of the most beautiful and powerful languages I’ve known. As I dug deeper, I enjoyed more of its wonderful metaprogramming abilities, which makes Ruby’s classes a lot more dynamic and easy to declare than Smalltalk’s. After reading this in-depth comparison of both, and Kent Beck’s article on the incompatibilities of Smalltalk’s VM implementations, I realized that I was generally more in favor of Ruby than Smalltalk (even though I fear that What Killed Smalltalk Could Kill Ruby, Too).

But Ruby is not perfect. In all fairness, its creator claims that it’s just plain impossible to design a perfect language. But I do believe it could be a bit better. People do point out some controversial rough edges, but these seem a bit trifle when compared to what really bothers me:

  • No scoped open classes. It is an issue that is actually being considered to be solved on 2.0 (and there are some branches of ruby that enable it), but, for the time being, there is no way to make the changes made by opening an existing class only apply to objects while being used on the a lexical scope. This is not the same as adding and removing the changes as, in this case, calls that fall out of scope (such as to another module or file) still see the changes. It would be nice to have both ways of open classes: scoped and not scoped.
  • Method arguments do not interleave with the method’s name (like in Smalltalk). Example: Instead of calling: File.fnmatch(‘*’, ‘/’, File::FNM_PATHNAME) you’d be to call it like: File.fn match: ‘*’ path: ‘/’ flags: File::FNM_PATHNAME. This seems weird, but it is a very powerful feature that allows method invocation to be descriptive, similar to Python’s named arguments or even Ruby’s named arguments with a hash. On the other hand it has a cleaner syntax than the former, and does not require checking hash keys as the later (the later is still useful for methods that want to receive an arbitrary list of named arguments). It would change a bit of the syntax of method_missing and how to deal with varargs and blocks for each parameter, but these can be dealt with as Scala does with its parameter lists.
  • There is no method called (). Procs (who are the obvious beneficiaries of such change ) use the method call, but one could be an alias to the other. There would be a mild ambiguity here, when you call the function that was just returned. For instance, imagine func is a method that receives no arguments and returns a function (that is: an object that implements the () method) wich also receives no arguments. Then func() could mean 1. call func, and 2. call the function returned by func. I am aware this is kinda of a sensitive topic, but the Scala approach to this issue is very simple: func and func() are the same (provided func can be called with no parenthesis). If you want to call the returned function in the same expression, use func()(). With the alias, you’d be able to call it like func.call() , func().call() or even func.call
  • No way to create simple blocks. It would be nice to have something similar to Scala’s underscore or Groovy’s it, which allows method invocations like: collection.map(get_square_of(_)) in Scala and collection.map(get_square_of(it)) in Groovy. Ruby’s symbol coercion to proc (&:method) does not really work on anything besides methods of the arguments of the block.
  • Difficulties of composing callback methods. You could be sure to always invoke super on them, and even meta-program all the classes/objects that do not do such thing. However, it is not easy to actually see which methods will be called, or even manipulate/re-prioritize the blocks of code on runtime (kinda like a Chain of Responsibility), which can be very bad, as these methods can modify a lot of behaviour throughout the Object Space.
  • The reflection API could be more thorough. For instance, you can’t get the source code of a method/class, etc (as you can in Python). You can use Parse Tree and Ruby2Ruby to do it, but Parse Tree is not portable (does not even work on ruby 1.9) and the output can be formated differently than the actual source code (which can be critical on DSLs). Also, methods added do not have information on which line of code they were added (which is less important when adding methods the recommended way: extending/including Modules), and properties created with class methods (such as those created by attr_reader, or some other libraries equivalents) can’t be discovered on runtime (they are like any other method, with no other meta-data whatsoever). Ruby also seems to be missing some helper methods, such as #metaclass.
  • No support for immutability. This is kinda nitpicking, but using recursive freeze (as noted by Dean Wampler) is not really practical (as it is really slow). Neither does it encompass immutable local variables. This is not only useful for concurrency and functional programming issues, but is also useful when writing code that is side-effect free so that it is easier to reason about.
  • The return value of a setter method (that is: one that ends with the equals symbol) is the argument, not the return value of the method. This is an issue that matters more when using immutable objects, as the only way for them to “mutate” is to return a new object. Therefore you can’t use a setter method on an immutable object, as, even if it returns a new one, the runtime will ignore the return value and set to the variable the argument that was received. On the other hand, I don’t think this can be changed without breaking a lot of existing code.

Several of this annoyances can be solved with a heavy dose of open classes, s-expressions manipulation (using Parse Tree) and meta-programming in general. Knuth has said that: Language designers also have an obligation to provide languages that encourage good style, since we all know that style is strongly influenced by the language in which it is expressed. Fully agreeing with such Sapir-Whorf-esque sentence, I feel it would be a good thing if the underlying listed solutions were built into the language itself (and supported across implementations, such as JRuby, Iron Ruby, Rubinius, Maglev), as it would not only improve, even if a little bit, the language itself, but also they way its users write code.

Update: Thanks Michael Fellinger for noting that ruby blocks are fully adherent to method definitions on 1.9, as they allow both default parameters and blocks as arguments.

Scala: The Successor to the Throne

Posted in Languages by Daniel Ribeiro on July 29, 2009

It has been a while since Java was the sole language running over a JVM. Scala is another such language which gained a lot attention recently for being used to scale Twitter‘s backend. Scala differs from most other languages that run on the JVM, such as Groovy, JRuby and Jython, as it is statically typed. This means that, similar to Java and C#, the types must be known at compile time. Scala is usually introduced as being both OO and functional. While this statement is true (and daunting, as many people are uncomfortable with the f*** word), it fails to grasp the important aspects of Scala.

Among the most direct benefits of using Scala feature:

  • Compatible with Java. Kinda obvious (as so are all the other 200+ languages over the JVM), but it is such an important feature that should not be overlooked. This means that Scala can use all Java libraries and frameworks. Which shows respect for people’s and companies investment on the technology.
  • Joint Compilation. This means that, like Groovy, Scala classes are compiled to Java classes, and therefore can be used on Java projects (even by java classes on the same project they are defined). Even if your team decides to make the complete move towards Scala, this can be useful integrating with dynamic languages via JSR 223.
  • Type Inference. If the compiler can guess the type (and it usually can), you don’t have to tell it. This allows Scala code to be as concise as dynamic languages, while still being type safe.
  • Implicit conversion allows you to achieve in a type safe way what extension methods do for C# and open classes (mostly) do for ruby. That is: add methods to types you might not have not defined yourself (such as strings, lists, integers). This is one of the features that make Scala DSL friendly.
  • Object immutability is encouraged and easy to accomplish. Scala even comes with immutable collections built-in.
  • Getters and Setters are automatically generated for you. If you don’t want them (if you only want setters for example), you have to explicitly make them private. Which is not a problem, as the common case is to want them.
  • Scala has first-order functions and implements an enumeration protocol (with the iterable trait), which helps keeping code clearer, more concise, and brings several other benefits.
  • The Actor programming model eases up the development of highly concurrent applications.
  • Exceptions don’t have to be explictly caught or thrown. It can be argued that having checked exceptions does more harm than good.

These features alone would be enough to make Scala a very interesting language, and worth being heralded as the current heir apparent to the Java throne by one of JRuby’s creator, Charles Nutter (a view somewhat shared by Neal Gafter). Or even worth of being endorsed both by Groovy’s creator, James Strachan, and by the inventor of Java, James Gosling.  Nonetheless Scala is deep, and there are several exciting advanced features that allow developers to be more productive. But learning such features before getting a good grasp the basics can be quite frustrating, more so without a good supporting literature (such as IBM‘s, Aritma‘s, Jonas Bonér‘s, Daniel Spiewak‘s, Sven Efftinge‘s, the official one, and several others). However it quite is feasible, not only encouraged, to delve into deeper concepts as you need them.

Even though Scala has academic roots (as it shows on its papers page, and some advanced concepts these tackle), Scala has been successfully used on enterprise projects, besides Twitter, such as Siemens, Électricité de France Trading and WattzOn.

Besides all the good points, Scala does have some rough edges. Even though many people are working on overcoming them, they are likely to be relevant on the short term:

  • Incipient IDE support. As Lift‘s author expressed, IDEs for Scala, while undergoing a lot of development, are not what they are for Java. There is poor refactoring support, code completion and unit test integration. Not to mention the fact that most framework support tools will not play nicely with Scala. This can also put off some newcomers, as an IDE can help people learn the language. On the other hand, Martin Folwer relativizes this IDE situation, as a language that allows you to be more productive can more than make up for the lack of sophisticated tools.
  • Joint Compilation is not supported by most IDEs as well. Again, likely to change as Scala grows in popularity.
  • Immutability on a class is not really immutability, since referring objects may not be immutable themselves. And there is no way at the moment to ensure the whole object graph is immutable.
  • Making JSR 223 work perfectly with Scala can be challenging. On the other hand, making it work good enough is quite attainable.
  • Scala doesn’t support metaprogramming. This can be worked around by combining it with dynamic languages, such as Ruby (following a polyglot programming approach), but if you are going to do heavy use of metaprogramming, than using a whole different language may be a better solution (Fan is another static type language that runs over the JVM, similar to Scala, that has metaprogramming support).
  • Frameworks that expect Java source, such as the client-side GWT, will not play nicely with Scala (note that people have made Scala work with GWT on the server-side though). However there is an ongoing project that will translate Scala into Java source.
  • The syntax and some concepts are bit different from Java, such as: inverted type declaration order, underscore being used instead of wildcards, asterisks and default values, many kinds of nothing, no static methods (you need to use singleton objects instead) and other minor things. The documentation walks through this quite nicely though, but keep in mind that it is not an automatic transition from writing Java to writing Scala code.

As Joe Armstrong said, the need for languages that allow developers to easily make use of CPUs with multiple cores will only increase as such CPUs become cheaper and gain more and more cores. Scala is quite suited for such task, while Java’s development is stuck dealing with issues that come from being widely deployed, uncertanties of how open it will be in the future and political issues with some of its main contributors. Given the situation, Scala seems to fit quite nicely the role of the successor to Java’s throne.

Follow

Get every new post delivered to your Inbox.