Metaphysical Developer

High Level Concurrency with JRuby and Akka Actors

Posted in Languages, Systems by Daniel Ribeiro on December 16, 2010

Many developers are used to low-level concurrency primitives, such as locks, monitors and semaphores. Java also has higher level concurrency utilities such as Atomic Objects and Fork/Join framework. Such primitives still require a lot of attention to shared variables, and are very easy to get wrong. Ilya Grigorik recently discussed other models of concurrency on his recent post Concurrency with Actors, Goroutines & Ruby, where he even introduced a ruby port of Go‘s concurrency mechanism. These models attempt to make concurrent programming easier.

The actor model is another very simple high level concurrency model: actors can’t respond to more than one message at a time (messages are queued into mailboxes) and can only communicate by sending messages, not sharing variables. As long as the messages are immutable data structures (which is always true in Erlang, but has to be a convention in languages without means of ensuring this property), everything is thread-safe, without need for any other mechanism. This is very similar to request cycle found in web development MVC frameworks.

Scala is famous for coming with an Actor library built-in. However, using Scala libraries in Ruby is not easy[1]. Akka is another great project that implements Actors, however it has a java Api, which makes the JRuby integration easier. Why JRuby? Not only to access Akka’s actor library, but also because JRuby is one of the few ruby implementations that doesn’t have the GIL, therefore it allows true concurrency for all types of applications (IO bounded or not).

Integrating Akka with JRuby

For starters, let’s first create a simple actor in Java:

public class PingActor extends UntypedActor {
	public void onReceive(Object message) throws Exception {
	    if (message instanceof String) {
	    	System.out.println("!!! Acted on: " + message);
	    }
	    else throw new IllegalArgumentException("Unknown message:" + message);
	}
}

This simple actor will just output any message it receives prefixed with “!!! Acted on: “, and will throw exception on any message that is not a string. This example show how simple it is to define an actor: just define a onReceive method that is called whenever a message is sent.

To see this actor working, we need four lines:

		ActorRef actor = actorOf(PingActor.class).start();
		actor.sendOneWay("hello actor world");
		TimeUnit.SECONDS.sleep(1);
		ActorRegistry.shutdownAll();

The first gets an actor reference and starts it. It is important to note that you cannot create an actor just by invoking new. Not in java or scala (we can solve this in Ruby). This is because there is a lot of AOP going on the background[2]. The second line just sends the message to the actor asynchronously. There two other ways of sending messages, which I’ll not cover, but you can read more in Akka’s documentation.

The last two lines just give time to the message reach the actor (remember, the sendOneWay method is non-blocking), and stops all actors on the system. Pretty simple right? Let’s see how we can do the same in JRuby. Setting up the stage:

require 'java'
module Akka
  include_package 'se.scalablesolutions.akka.actor'
end

These lines enable java and make a ruby module with all the classes of se.scalablesolutions.akka.actor package. Basic JRuby setup. Now on to defining the actor:

class PingActor < Akka::UntypedActor
  def self.create(*args)
    self.new(*args)
  end

  def onReceive(message)
    puts "!!! Acted on: #{message}"
  end
end

Here we have our first differences. The onReceive is just a cleaner version of the Java one. No type annotations, no type checking and a simpler string output. However, we have to define a classmethod called create, which just invokes new. This method seems to be created by the AOP part of Akka, which doesn’t seem to work on Ruby subclasses of UntypedActor. However, we can defined it ourselves. Now to actually using the actor:

actor = Akka::UntypedActor.actorOf(PingActor).start
actor.sendOneWay "hello actor world"
sleep 1
Akka::ActorRegistry.shutdownAll

Pretty much the same four lines as on the Java version, with a little less parenthesis, and a terser sleep method. The ruby code can be found on this page, and Java code here.

Fixing the Ruby Interface

Much of the code in the former example is infrastructure, but we can work around the static nature of Java classes in ruby. As factored out in akka.rb, we can gather this functionalities into a base class, and rewrite the first example in 8 lines:

require 'akka'
class PingActor < Actors::Base
  def onReceive(message)
    puts "!!! Acted on: #{message}"
  end
end
PingActor.spawn.sendOneWay "hello actor world"
Actors.delayedShutdown 1

Using closures we can even enhance our JRuby api with some ideas from the Scala api, making it down to 3 lines:

require 'akka'
Actors.spawn { |m| puts "!!! Acted on: #{m}" }.sendOneWay "hello actor world"
Actors.delayedShutdown 1

In this example spawn takes a block, and creates an actor that executes it every time it receives a message. As in the Scala API, spawn starts the actor as well as creating it.

But every Object is an Actor!

Alan Kay, the inventor of Smalltak and of the term OO, once said:

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging”.

This is one of the reasons that Erlang with its actors form an object oriented language[3]

If we look into the resemblance of sendOneWay and the reflective method invocation, which in Ruby is made through send or __send__, it is quite easy to adapt the ruby method invocation to Actor message sending. We start with a simple delegator:

MethodParameters = Struct.new :name, :args, :block

class DelegatorActor < Base
    def self.new(target)
      ret = super()
      ret.instance_variable_set(:@target, target)
      return ret
    end

    def onReceive(message)
      param = message
      @target.__send__ param.name, *param.args, &param.block
    end
  end

The important part is the onRecieve message, which takes a MethodParameters object and invoke on the target. The caveat here is that we need to override the new method, because, for some reason, ruby subclasses of Akka UntypedActors will not invoke the initialize method with the arguments passed[4]. However, by turning any object into an actor, this can be the only place such hack is needed.

Now, the next step: adapting the actorRefs to make the ruby method invocation a actor sendOneWay:

class ActorRefHandler
    public_instance_methods.each do |m|
      undef_method m unless m =~ /^__/ or m == 'to_s'
    end

    def initialize(actorRef)
      @actorRef = actorRef
    end

    def method_missing(name, *args, &block)
      @actorRef.sendOneWay MethodParameters.new name, args, block
    end
  end

Which is a pretty standard implementation of message forwarding in ruby: remove all instance methods (except the really private ones, such as __send__), making sure all method calls are forwarded to method_missing.

With all of this we can write a simple example of making any object an actor:

require 'akka'
class HelloWord
  def hi
    puts "hello actor world"
  end
end
Actors.actorOf(HelloWord.new).hi
Actors.delayedShutdown 1

Making it faster

In the heart of all of this lies the problem: making code runs faster by using the machine’s cores more effectively. Here we build the good old canonical map-reduce example: word count. We will count 5.4 MB of Shakespeare‘s texts. The example consists of 3 types of actors: a producer, mappers, and one reducer. The producer generates the chunks of lines to the mappers, which count the words on each chunk and generate a hash of word:count pairs, which the reducer aggregates into a hash of its own.

require 'akka'
require 'regular_word_count'
include Actors
module AkkaDispatcher
  include_package 'se.scalablesolutions.akka.dispatch'
  def self.workStealer(name)
    Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher 'mappers'
  end
end

file = File.join(File.dirname(__FILE__), 'shakespeare.txt')
input = IO.readlines(file).each_slice(500).map &:join

This code setups up the code to later define WorkStealer, so that our map actors can share the same message queue. We also load the file in memory, split into 500 lines chunks. If the chunks are too small, the mappers will receive too many messages, which makes the code go slow. If the chunks are too big, the job will not be split evenly among the mappers[5].

start = nil
values = Hash.new 0
linesToRead = input.size
reduceActor = actor do |message|
  linesToRead -= 1
  hash = message
  hash.each do |key, value|
    values[key] += value
  end
  if linesToRead == 0
    puts ">> All over: Just to say we used any computed value: #{values['shakespeare']}"
    finish = Time.now
    puts ">> Total time: #{finish - start}s"
    Akka::ActorRegistry.shutdownAll()
  end
end

The reducer actor is pretty straightforward. When all chunks are read, he shutdowns all actors and outputs the result and the time it took for the whole map-reduce chain to take place.

mapActorsSize = 2
mapActors = []
wordCount = WordCount.new
workStealer = AkkaDispatcher.workStealer 'mappers'
mapActorsSize.times do
  mapActor = actor do |message|
    reduceActor.sendOneWay wordCount.count message
  end
  mapActor.setDispatcher workStealer
  mapActors.push mapActor
end

The mappers delegate the actual work to an immutable WordCount class. The important part is the one that sets the same dispatcher on the actors. More on how this work on Akka’s documentation.

mapActor = mapActors.first
producer = actor do |message|
  for line in input
    mapActor.sendOneWay line
  end
end

allActors = [reduceActor, producer] + mapActors
allActors.each do |a|
  a.start
end
start = Time.now
producer.sendOneWay :start

These lines define the producer, start all actors, set the start time, and send the producer actor a message, which begins the map-reduce chain. It is important to note that the program’s main thread finishes on the last line. This example shows how It is possible to make it wait for the result and then resume the main thread (it requires using the other types of message sending methods, thus I’ll not cover it in detail).

Results: The sequential version runs on my machine (which has 2 cores) in about 4 seconds. This one with map-reduce actors take about 3 seconds, which yields a 25% improvement[6].

Conclusion

This post showed how it is easy to use Akka actors with JRuby and that they can easily enable thread-safe and easy to reason multicore programming. The Akka project has many other tools to help with distributed/parallel programming, such as remote actors, software transactional memory, and integrations with all sorts of persistence/queue systems. This post barely scratches the surface.

All the code on this blog post can be found on github, where all dependencies are easily available, and instructions on how to easily run the code. Give it a try, and see if you agree (or not) with others that writing parallel code can be much easier and fun.

Footnotes

[1] As Daniel Spiewak showed it on his Integrating Scala into JRuby post.

[2] Incidentally Akka was started by Jonas Bonér, who is also one of the creators of the java AOP tool AspectWerkz, which is included in Aspectj nowadays.

[3] From a interview with Joe Armstrong, the creator of Erlang:

Actually it’s a kind of 180 degree turn because I wrote a blog article that said “Why object-oriented programming is silly” or “Why it sucks”. I wrote that years ago and I sort of believed that for years. Then, my thesis supervisor, Seif Haridi, stopped me one day and he said “You’re wrong! Erlang is object oriented!”

[4] In general, you need to create a UntypedActorFactory to pass arguments to the constructor, which we already do (implicitly, using JRuby’s closure to interface coercion), but even then Ruby’s Actors will not work. This could be worked around by changing the Actors module and invoking another hook method that is not initialize.

[5] Thanks to the Akka committers Viktor Klang and Peter Veentjer for the hint.

[6] This is not a real benchmark. Making a real one requires a lot more attention to details like jvm warm-up, jiting from ruby to java, jiting on java bytecode, and so on.

Tagged with: , , , ,
Follow

Get every new post delivered to your Inbox.