Metaphysical Developer

Conjcraft: A Minecraft Mod implemented in Clojure

Posted in hacking, Languages, Minecraft by Daniel Ribeiro on April 20, 2012

When you don’t create things, you become defined by your tastes rather than ability. Your tastes only narrow & exclude people. So create.

– why the lucky stiff

TL;DR:  Source here, and here a video of the mod in action:

Conjcraft

Conjcraft is a simple and extensible Mod for Minecraft written in Clojure (and some Java). Besides introducing two new blocks (Clojure and Github, which is hosting the source here), it brings an extremely simple and small DSL for writing Minecraft recipes.

The recipe DSL cleans up on on Minecrafts original one (which is alredy terse for a Java DSL). Compare these simple ones:

addRecipe(new ItemStack(Block.rail, 16), new Object[]
                {
                    "X X", "X#X", "X X", 'X', Item.ingotIron, '#', Item.stick
                });
(recipe-dsl {\X :ingotIron \# :stick}
  "X X
   X#X
   X X" 'rail 16)

Small explanation: the Clojure version is essentially the ascii art of this recipe:

Disclaimer: I’ll not try to teach Clojure here (besides saying it is a Lisp). If you need more info, there are great resources on the web.

This gain in expressiveness (which is come from the fact that Clojure is extremely more expressive than Java) is compounded in multiple recipes, specially after defining a consistent character to block/item mapping:

(def char-block (create-input-char-binding
                  '(
                     d dirt
                     o cobblestone
                     g github
                     c clojure
                     r redstone
                     )))

Many recipes can use them:

(defn recipes []
  (recipe-dsl char-block
     "d
      d" 'github

     "o
      o" 'clojure

     "c
      c
      c" 'swordGold

     "c c
      c c" 'bootsGold

     "cgc
      cgc" 'bootsDiamond

     "ccc
      c c
      c c" 'legsGold

     "ccc
      cgc
      cgc" 'legsDiamond
    ))

And finally, all of this is encoded in plain text Clojure files, stored in the conjcraft directory inside  user.home (which on Linux and Mac OS it is usually the user’s home directory, aka ~).

This way Conjcraft is very extensible, as it allows the users to add blocks and recipes, without requiring Eclipse or MCP, or to recompile and obfuscate the de-obfuscated Java code.

Such simplicity, though, did not come easily…

Origins

One of the things that has always amazed me about Minecraft is how simple its concept is. I believe this simplicity is actually paramount to its success: by giving you very solid and small building blocks (no pun intended), the game steps away and let the user create its own goals and be shine on its own.

This simplicity also lets other developers step in and create a huge variety of amazing mods (out of which, one my personal favorites is the Aether mod, for being a very ambitious project, and showing how much great content you can create on top of such a simple and powerful platform).

“Simplicity Ain’t Easy”: Stuart Halloway masterfully made this argument, exploring what simple is (one of the key points being that simple is “not compound”), its importance, and how Clojure is a simple language, which actually makes it very powerful. Inspired on the simplicity and power of both Clojure and Minecraft (and continuing my healthy(?) obsession with Minecraft and Clojure) it seemed only natural for me to set to create a simple mod on top of both platforms (natural because both of them run on top of JVM).

Modding Minecraft with Java is quite straightforward with the help of Minecraft Coder Pack (aka MCP) and ModLoader. Calling clojure from Java is also very straightforward, to the point that you basically need a Java class like this:

public class mod_Conjcraft extends BaseMod {
    public void load() {
        try {
            File file = new File(new File(System.getProperty("user.home"), "conjcraft"), "conjcraft_main.clj");
            System.out.println("Loading clojure mod files from " + file.getAbsolutePath());
            clojure.lang.Compiler.loadFile(file.getAbsolutePath());
            clojure.lang.RT.var("conjcraft", "call").invoke();
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }
}

And then I was able to create a very small function, in 5 lines of Clojure, to add a recipe that would take one block off dirt and output 7 blocks of dirt:

(ns conjcraft)
(import '(net.minecraft.src Block ModLoader ItemStack))
(defn call []
  (let [dirt Block/dirt]
    (ModLoader/addRecipe (ItemStack. dirt 7) (to-array ["#" \# dirt]))))

This actually works pretty well when using Eclipse, or the recompile.sh script that comes with MCP. The fun really began when I started preparing to release it…

The 1st rule of the Obfuscator Club is:

You can’t defeat the obfuscator. This is actually really important. Minecraft is obfuscated in its original distribution, which makes a lot of sense for a proprietary and commercial game. MCP tools de-obfuscate the original java code from its original form, giving methods and classes names very straightforward and sensible names.

The problem is that, in general, gamers will need your mod in the obfuscate code, as they game expects classes to use the obfuscated names. Therefore you absolutely must obfuscate your mod.

The 2nd rule of the Obfuscator Club is:

You can’t defeat the obfuscator.

Clojure does have the capability of generating .class files with its Ahead of Time (AOT) compiler.  Since the obfuscator does not operate on java source code, but on .class files, this could have helped. But it doesn’t. Other languages that run on JVM like Scala (which compiles to pretty Java-like bytecode) and Mirah (which can even compile to Java source code) can actually get around the obfuscator this way, as long as you don’t use features that require reflection.

To understand why it doesn’t work with Clojure, let me show you what this simple AOT example:

(ns core
  (:gen-class :main true))

(defn -main []
  (println "Hello World!"))

With some help of JD-GUI we can see the equivalent Java code of the generated class files, in particular:

public class core
{
  private static final Var main__var = Var.internPrivate("core", "-main");
  private static final Var equals__var = Var.internPrivate("core", "-equals");
  private static final Var toString__var = Var.internPrivate("core", "-toString");
  private static final Var hashCode__var = Var.internPrivate("core", "-hashCode");
  private static final Var clone__var = Var.internPrivate("core", "-clone");

These seemly innocuous lines actually break in runtime. This happens because the obfuscator has another very important property: it puts everything on top level namespace (no packages). Note that the package “core” is written as a literal string, which the obfuscator will not touch. And currently there is no way to use AOT with empty namespaces

You could change the Clojure compiler, or use tools to manipulate the byte code on the class files, but there is actually a much simpler solution:

Breaking the rules: Defeating the Obfuscator

Clojure is famous for supporting one of the most powerful types of metaprogramming: template macros. I have not exploited it on the project because macros can be very hard to understand (think of them as functions that take code in its raw Abstract Syntax Tree form, and output another raw Abstract Syntax Tree), and I wanted to keep the project very accessible.

The point is that I used Clojure to generate Java source code, on compile time (the type of metaprogramming you always have the option to use, no matter the platform or base language you are based on).

This is done by the create_constants.clj script, which actually imports the de-obfuscated code and generates a Java file mapping all block, item and material names to their actual objects (the result cannot be published without breaking both Minecraft and MCP licenses, but reading the code you can get an idea of what the result looks like).

Using the property highlighted before, that the literal strings will not be obfuscated, and knowing that the obfuscator will not obfuscate the attribute names of classes you create (only make stripe their package), this static maps are available to be used directly by interpreted Clojure code.

The final element of defeating the obfuscator is the ExtendableBlock class. It essentially takes Clojure functions (clojure.lang.IFn interface), and delegate methods to them (some methods have to be re-exposed even when public, as the original public method names will be obfuscated).

Conclusions

Modding Minecraft is extremely fun, and it gets a lot more enjoying when doing it in languages that that are fun to use. I’ve used Clojure here, but there are many other languages that could have been used. So have fun, and create.

Thanks

Thanks Notch for making Minecraft and supporting the modding community. Thanks for all the presenters at ClojureWest for inspiring me to bring Clojure to new places. Thanks Robert for making one of the best Minecraft modding tutorials out there. And finally thanks to all the creators of MCP and ModLoader for making modding a simpler and pleasant experience.

Tagged with: , , ,

ClojureScript vs Coffeescript

Posted in hacking, Languages by Daniel Ribeiro on August 28, 2011

A language that doesn’t affect the way you think about programming, is not worth knowing

Alan J. Perlis

Edit Feb/2014: Please note that this post is from 2011, a few weeks after Clojurescript was released. Things changed a lot in the mean time…

In the past few years Javascript has gained a lot of attention and ubiquity: HTML5 technologies leverage a lot of Javascript, which enables people to create amazing dream worlds (like the in the ROME project) with WebGL, V8 brings a lot of JIT techniques to a JavaScript Virtual Machine which helps Google Chrome be a very fast browser, and powers NodeJS (allowing people to create a web page in a single programming language).

Javascript also powers queries in NoSQL databases like Mongo and CouchDB, and it can be used when making QT applications, 3d Games in Unity and even mobile apps with frameworks like WebMynd and PhoneGap. It has been a long way from the old days when it was confined to the browser, and mostly used for form validation.

In spite of all of this attention, Javascript has been so misunderstood that attention to its Good Parts had to drawn. It doesn’t help that its prototype based OO was first introduced by the rather unknown language Self, despite the several advantages it has when compared to traditional class based OO (the paper Organizing Programs Without Classes, written by Google’s Senior VP of Operations Urs Hölzle, who, among other things, also contributed to key JIT techniques like polymorphic inline caching).

Therefore it is not surprising that there are many projects that compile existing languages to Javascript. Many of these were too focused on the web platform (like Google Web Toolkit and the amazing Cappuccino‘s Objective-j). On more recent years languages are targeting the whole JS ecosystem (which makes a very poignant argument that JavaScript is Assembly Language for the Web). Coffeescript is one of such languages, which very fond (I’ve written about it recently).

About a month ago ClojureScript was released, porting the Clojure language from Java ecosystem to the JS. Clojure is quite an amazing effort of engineering, not only for being a very successful Lisp on the JVM, but also for its novel approach of handling time and state (which its creator, Rich Hickey, explains really well).

I was really excited to see the examples, but I was a bit bummed out that the most interesting example was a Twitter visualization tool (which feels a bit too much like a 2010 app). Since both CoffeeScript and Clojure are fun languages, and I wanted to see how ClojureScript would compare to CoffeeScript, I took the challenge and crafted a simple game HTML5 physics based game on both languages, using Box2dWeb, a js port of Box2D (the physics engine, created by Erin Catto, that is behind Angry birds).

The game consists of clicking on the objects to destroy them, so that they don’t reach the top of the canvas (and, in another very Tetris like fashion, the elements pop out faster the more you play). It really sticks to the bare minimum of Terrano’s Hierarchy of Gamer Needs. All the code is open source and can be found on Github. The CoffeeScript‘s version source can be found here, and the ClojureScript‘s here.

Lessons Learned

Disclaimer: ClojureScript is pretty much in alpha status, so many things are likely to improve in the future.

Compiling: The first thing that really pops up is how fast Coffeescript compiles down to JS. The watch behavior allows you to fire the compilation process and forget it. ClojureScript takes me about 5 seconds to compile a single file. Granted it gives warning about unused/undefined variables, but I’d really prefer it to compile instantly and let the browser tell me this on runtime.

Namespaces: Clojure’s namespace are implemented as global variables, which are shadowed by local variables with the same name. For instance, if you are in a namespace called game, don’t use local variables and arguments named game. This is really important, as ClojureScript will use the global namespace for every single function defined in that namespace, so shadowing it is likely to give all sorts of errors.

ClojureScript is not Clojure: Fogus wrote an interesting piece on the lack of eval on ClojureScript. Even though I find it might make sense for some web pages, when making WebGL games, or even Canvas 2d games, the assets size can easily overshadow then entire library’s size. Which is not a big deal if you use HTML5’s Cache manifest. In the end, it felt very much like the opposite of the Lisp spirit (epitomized by Paul Graham on his Five Questions about Language Design: “Give the Programmer as Much Control as Possible“).

The documentation is quite clear that eval is not supported. What it is not clear, is that this argument against eval permeates many other functions: resolve is not implemented (neither ns-resolve, or the *ns* definition). Without both of them, there is no way to transform a string into a function. For people more used to OO languages, like Javascript, Ruby and Python, this essentially means that ClojureScript doesn’t have any reflection APIs. On the game:

createElement: ->
  randomY = (0.2 + 0.4 * Math.random())*  H / @scale
  randomX = (Math.random() * (W - 50) + 25) / @scale
  type = @objectList[randomInt(@objectList.length)]
  @["create#{type}"] randomX, randomY, Math.random() + 1

The last line of the Coffeescript version uses reflection to get the correct method name (to decide to invoke createTriangle, createCircle or createSquare ). The ClojureScript version had to be translated into:

(defn- create-element [game]
  (let [randomY (/ (* H (+ 0.2 (* 0.4 (rand)))) scale)
        randomX (/ (+ 25 (* (rand) (- W 50))) scale)
        type (rand-nth [:circle :square :triangle])
        method (keyword (str "create-" (name type)))]
    ((@game method) game randomX randomY (inc (rand)))
    )
  )

Which is possible because the game has the respective functions as keyword attributes, which can be easily converted from string interpolation.

User Macros are not supported: At the moment at least (support for it is likely to come on following updates). This makes the situation above much harder to take. But this is mostly due to ClojureScript’s alpha status. This does make the code a bit longer (the CoffeeScript version has 236 lines, while the ClojureScript has 301 lines). In order to circumvent what I consider that would be one of the ugliest bits caused by lack of macros (several (set! (. obj attr) value) calls), I defined a js-set function:

(defn- js-set
  "Sets an attribute name to a value on a javascript object
Returns the original object"
  ([jsobject attr value]
    (do (native-set-wrapper jsobject attr value)
      jsobject))
  ([jsobject & values]
    (do (doseq [[attr value] (apply hash-map values)]
          (native-set-wrapper jsobject attr value))
      jsobject)))

This is actually really against Clojure’s spirit, as Clojure really promotes immutable code. However Javascript libraries, in particular Box2dWeb, really expect mutable state. Therfore handling native js objects require such functions (note that converting them to clojure and keeping it on clojure land can be easily done with the nice undocumented function js->clj function, which is actually used on TwitterBuz).

Therefore we can write functions this way

(defn- create-fixture
  ([shape] (js-set (b2FixtureDef.)
  :density 3
  :friction 0.3
  :restitution 0.9
  :shape shape
))
([] (create-fixture nil))
)

Instead of:

(defn- create-fixture
  ([shape] (let [f (b2FixtureDef.)]
  (set! (. f density) 3)
  (set! (. f friction) 0.3)
  (set! (. f restitution) 0.9)
  (set! (. f shape) shape)
))
([] (create-fixture nil)))

Which makes it look a lot like assoc function for creating maps with updated values. This version inspired me to refactor the Coffeescript version using a similar assoc function:

createFixture = (shape) ->
f = new b2FixtureDef
f.density = 3.0
f.friction = .3
f.restitution = .9
f.shape = shape if shape?
return f

became:

assoc = (o, i) -> o[k] = v for k, v of i; o

createFixture = (shape) ->
assoc new b2FixtureDef,
density: 3
friction: .3
restitution: .9
shape: shape

Which is quite similar to ClojureScript’s version (the colons are on the right instead of the left, and it requires a comma). Which reduces the amount of accidental complexity to a minimum.

Edit: Thanks everybody for pointing out that you can use macros with Clojurescript. However, at the moment, the are clojure macros (so no js), and they require you hacking your clojurescript to add the macro files in the classpath. Hiccups and cljs-3d are two projects that do this, so you can see on their build files how they do this. Even then, you still need to use require-macros. All of this makes macros less of a native feature on Clojurescript, and it makes a lot harder to share code seamlessly.

IDE support: Clojure’s IDE support is really nice. Intellij’s La Clojure (avaiable on its free Community Version) does a lot more than mere syntax highlight: minimal refactoring support, rainbow parenthesis, smart parenthesis, syntax highlighted repl, great autocomplete support, awesome code navigation, autocomplete for java classes and live templates. And it works pretty well for ClojureScript as well. Other IDEs are also great, even though Emacs support can be a bit more intense on its setup (which is not something emacs users are unfamiliar with).

Even though I am really happy with Github’s founder Chris Wanstrath work on the Emacs mode for Coffeescript, it doesn’t have the same support that Clojure does. It is getting more and more support, but nowadays Clojure has the upper hand.

Debugging support: Browser support for debugging languages that compile down to javascript is coming, but at the moment Cofffeescript compiles down to such a readable JS that it not a big problem. This is a known issue with ClojureScript at the moment. Even on pretty print compile mode.

Conclusions

Since Javascript on the web has a much more simple execution model than Java, Clojure’s amazing concurrency control mechanisms are not as shining. Nevertheless ClojureScript is a delight to work with. As it moves out of its Alpha status, many of the issues are likely to be gone. I also expect it to support the full Clojure language (including things like resolve, letfn, macros and eval), as writing web apps in a single language on client and server is a really nice feature. This would also make ClojureScript even more interesting, as it would allow developers do leverage all the power of existing Clojure libraries into their Javascript work (and possibly use it on a more polyglot environment).

Coffeescript is more suitable for production apps right now, but it is nice to see all these developer efforts to allow people to be more productive and happy with their work on Javascript platforms (this way we don’t have to wait for Google’s NaCl and PNaCL, which promise to bring even more languages to the environment).

Tagged with: , ,

Peter Norvig’s Spelling Corrector in 21 Lines of Coffeescript

Posted in Languages by Daniel Ribeiro on March 31, 2011

Coffeescript is a very nice (and relatively new) language that compiles down to javascript, making web programming (and making firefox plugins, nodejs apps, and so forth) much more joyful. Its object model is the same as javascript (one of coffeescript’s motto is Unfancy JavaScript), and its compiled js form is quite easy to read and debug. It has many niceties, including classes (effectively making the prototype chain a first class citizen of the language) and array/object comprehensions (heavily influenced by python’s list comprehensions).

Ruby also has a influence on the language, such as optional parenthesis on method/function invocation. In fact, the original version of Coffeescript compiler was written in Ruby (but nowadays coffeescript is a self-hosting language).

Coffeescript has been used by several projects, including a mobile framework written by Rail’s creator 37 signals. I’ve been using for about one year (including some open source work using a HTML 5 Canvas framework called EaselJs, a port of ruby functionalities and even a Firefox plugin).

Because of all the Ruby and Python influence on the language, and the fact that Coffeescript can convey beautiful and concise code, I had a hunch that it could get a really good position on Peter Norvig’s Spelling Corrector implementation collection (Javascript’s version currently has 53 lines, which is a lot more than python‘s 21). With some work, I managed to implement it in 21 lines as well:

words = (text) -> (t for t in text.toLowerCase().split(/[^a-z]+/) when t.length > 0)
Array::or = (arrayFunc) -> if @length > 0 then @ else arrayFunc()
Array::flat = -> if @length == 0 then @ else @[0].concat(@[1..].flat())
train = (features) ->
 model = {}
 (model[f] = if model[f] then model[f] +1 else 2) for f in features
 return model
NWORDS = train(words(require('fs').readFileSync('./lib/big.txt', 'utf8')))
alphabet = 'abcdefghijklmnopqrstuvwxyz'.split ""
edits1 = (word) ->
 s = ([word.substring(0, i), word.substring(i)] for i in [0..word.length])
 deletes = (a.concat b[1..] for [a, b] in s when b.length > 0)
 transposes = (a + b[1] + b[0] + b.substring(2) for [a, b] in s when b.length > 1)
 replaces = (a + c + b.substring(1) for c in alphabet for [a, b] in s when b.length > 0)
 inserts = (a + c + b for c in alphabet for [a, b] in s)
 return deletes.concat transposes.concat replaces.flat().concat inserts.flat()
known_edits2 = (word) -> ((e2 for e2 in edits1(e1) when NWORDS[e2]? for e1 in edits1(word)).flat())
known = (words) -> (w for w in words when NWORDS[w])
correct = (word) ->
 candidates = known([word]).or -> known(edits1(word)).or -> known_edits2(word).or -> [word]
 ({k: w, v: NWORDS[w] or 1} for w in candidates).sort((a, b)-> b.v  - a.v)[0].k

All the code is hosted on github. The code above can be seen in a more readable version here (after line 21 it also contains a full test, using a fixture generated by a slightly modified version of Peter Norvig’s original implementation). There is a more testable version, along with Jasmine BDD tests (which look a lot like Rspec’s in Coffeescript), which run headless on NodeJs, but work just fine in the browsers.

Considerations

Findall regex doesn’t exists natively in Javascript, however it is equivalent to spiting by the complementary regex (line 1)

Array::or (line 2) was needed to be implemented, because Python’s truthfulness allows a collection to be true (actually, any iterable) as long as it is not empty. Array::flat (line 3) has to be implemented because Coffeescript’s loop comprehension is a bit different from python’s: double loops (example: x + y for y in col1 for z in col2) return array of arrays instead of a single array.

Also note that loop comprehension’s order is inverted (x + y for x for y in python is translated as x + y for y for x).

Granted, this version runs really fast on NodeJs 0.4.1, and I was quite happy with how the resulting code looked. I was even happier that I did not have to write the compiled Javascript file and its whooping 147 lines of Spelling Corrector.

To see the reason why this work, check out Peter Norvig’s original post.

High Level Concurrency with JRuby and Akka Actors

Posted in Languages, Systems by Daniel Ribeiro on December 16, 2010

Many developers are used to low-level concurrency primitives, such as locks, monitors and semaphores. Java also has higher level concurrency utilities such as Atomic Objects and Fork/Join framework. Such primitives still require a lot of attention to shared variables, and are very easy to get wrong. Ilya Grigorik recently discussed other models of concurrency on his recent post Concurrency with Actors, Goroutines & Ruby, where he even introduced a ruby port of Go‘s concurrency mechanism. These models attempt to make concurrent programming easier.

The actor model is another very simple high level concurrency model: actors can’t respond to more than one message at a time (messages are queued into mailboxes) and can only communicate by sending messages, not sharing variables. As long as the messages are immutable data structures (which is always true in Erlang, but has to be a convention in languages without means of ensuring this property), everything is thread-safe, without need for any other mechanism. This is very similar to request cycle found in web development MVC frameworks.

Scala is famous for coming with an Actor library built-in. However, using Scala libraries in Ruby is not easy[1]. Akka is another great project that implements Actors, however it has a java Api, which makes the JRuby integration easier. Why JRuby? Not only to access Akka’s actor library, but also because JRuby is one of the few ruby implementations that doesn’t have the GIL, therefore it allows true concurrency for all types of applications (IO bounded or not).

Integrating Akka with JRuby

For starters, let’s first create a simple actor in Java:

public class PingActor extends UntypedActor {
	public void onReceive(Object message) throws Exception {
	    if (message instanceof String) {
	    	System.out.println("!!! Acted on: " + message);
	    }
	    else throw new IllegalArgumentException("Unknown message:" + message);
	}
}

This simple actor will just output any message it receives prefixed with “!!! Acted on: “, and will throw exception on any message that is not a string. This example show how simple it is to define an actor: just define a onReceive method that is called whenever a message is sent.

To see this actor working, we need four lines:

		ActorRef actor = actorOf(PingActor.class).start();
		actor.sendOneWay("hello actor world");
		TimeUnit.SECONDS.sleep(1);
		ActorRegistry.shutdownAll();

The first gets an actor reference and starts it. It is important to note that you cannot create an actor just by invoking new. Not in java or scala (we can solve this in Ruby). This is because there is a lot of AOP going on the background[2]. The second line just sends the message to the actor asynchronously. There two other ways of sending messages, which I’ll not cover, but you can read more in Akka’s documentation.

The last two lines just give time to the message reach the actor (remember, the sendOneWay method is non-blocking), and stops all actors on the system. Pretty simple right? Let’s see how we can do the same in JRuby. Setting up the stage:

require 'java'
module Akka
  include_package 'se.scalablesolutions.akka.actor'
end

These lines enable java and make a ruby module with all the classes of se.scalablesolutions.akka.actor package. Basic JRuby setup. Now on to defining the actor:

class PingActor < Akka::UntypedActor
  def self.create(*args)
    self.new(*args)
  end

  def onReceive(message)
    puts "!!! Acted on: #{message}"
  end
end

Here we have our first differences. The onReceive is just a cleaner version of the Java one. No type annotations, no type checking and a simpler string output. However, we have to define a classmethod called create, which just invokes new. This method seems to be created by the AOP part of Akka, which doesn’t seem to work on Ruby subclasses of UntypedActor. However, we can defined it ourselves. Now to actually using the actor:

actor = Akka::UntypedActor.actorOf(PingActor).start
actor.sendOneWay "hello actor world"
sleep 1
Akka::ActorRegistry.shutdownAll

Pretty much the same four lines as on the Java version, with a little less parenthesis, and a terser sleep method. The ruby code can be found on this page, and Java code here.

Fixing the Ruby Interface

Much of the code in the former example is infrastructure, but we can work around the static nature of Java classes in ruby. As factored out in akka.rb, we can gather this functionalities into a base class, and rewrite the first example in 8 lines:

require 'akka'
class PingActor < Actors::Base
  def onReceive(message)
    puts "!!! Acted on: #{message}"
  end
end
PingActor.spawn.sendOneWay "hello actor world"
Actors.delayedShutdown 1

Using closures we can even enhance our JRuby api with some ideas from the Scala api, making it down to 3 lines:

require 'akka'
Actors.spawn { |m| puts "!!! Acted on: #{m}" }.sendOneWay "hello actor world"
Actors.delayedShutdown 1

In this example spawn takes a block, and creates an actor that executes it every time it receives a message. As in the Scala API, spawn starts the actor as well as creating it.

But every Object is an Actor!

Alan Kay, the inventor of Smalltak and of the term OO, once said:

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging”.

This is one of the reasons that Erlang with its actors form an object oriented language[3]

If we look into the resemblance of sendOneWay and the reflective method invocation, which in Ruby is made through send or __send__, it is quite easy to adapt the ruby method invocation to Actor message sending. We start with a simple delegator:

MethodParameters = Struct.new :name, :args, :block

class DelegatorActor < Base
    def self.new(target)
      ret = super()
      ret.instance_variable_set(:@target, target)
      return ret
    end

    def onReceive(message)
      param = message
      @target.__send__ param.name, *param.args, &param.block
    end
  end

The important part is the onRecieve message, which takes a MethodParameters object and invoke on the target. The caveat here is that we need to override the new method, because, for some reason, ruby subclasses of Akka UntypedActors will not invoke the initialize method with the arguments passed[4]. However, by turning any object into an actor, this can be the only place such hack is needed.

Now, the next step: adapting the actorRefs to make the ruby method invocation a actor sendOneWay:

class ActorRefHandler
    public_instance_methods.each do |m|
      undef_method m unless m =~ /^__/ or m == 'to_s'
    end

    def initialize(actorRef)
      @actorRef = actorRef
    end

    def method_missing(name, *args, &block)
      @actorRef.sendOneWay MethodParameters.new name, args, block
    end
  end

Which is a pretty standard implementation of message forwarding in ruby: remove all instance methods (except the really private ones, such as __send__), making sure all method calls are forwarded to method_missing.

With all of this we can write a simple example of making any object an actor:

require 'akka'
class HelloWord
  def hi
    puts "hello actor world"
  end
end
Actors.actorOf(HelloWord.new).hi
Actors.delayedShutdown 1

Making it faster

In the heart of all of this lies the problem: making code runs faster by using the machine’s cores more effectively. Here we build the good old canonical map-reduce example: word count. We will count 5.4 MB of Shakespeare‘s texts. The example consists of 3 types of actors: a producer, mappers, and one reducer. The producer generates the chunks of lines to the mappers, which count the words on each chunk and generate a hash of word:count pairs, which the reducer aggregates into a hash of its own.

require 'akka'
require 'regular_word_count'
include Actors
module AkkaDispatcher
  include_package 'se.scalablesolutions.akka.dispatch'
  def self.workStealer(name)
    Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher 'mappers'
  end
end

file = File.join(File.dirname(__FILE__), 'shakespeare.txt')
input = IO.readlines(file).each_slice(500).map &:join

This code setups up the code to later define WorkStealer, so that our map actors can share the same message queue. We also load the file in memory, split into 500 lines chunks. If the chunks are too small, the mappers will receive too many messages, which makes the code go slow. If the chunks are too big, the job will not be split evenly among the mappers[5].

start = nil
values = Hash.new 0
linesToRead = input.size
reduceActor = actor do |message|
  linesToRead -= 1
  hash = message
  hash.each do |key, value|
    values[key] += value
  end
  if linesToRead == 0
    puts ">> All over: Just to say we used any computed value: #{values['shakespeare']}"
    finish = Time.now
    puts ">> Total time: #{finish - start}s"
    Akka::ActorRegistry.shutdownAll()
  end
end

The reducer actor is pretty straightforward. When all chunks are read, he shutdowns all actors and outputs the result and the time it took for the whole map-reduce chain to take place.

mapActorsSize = 2
mapActors = []
wordCount = WordCount.new
workStealer = AkkaDispatcher.workStealer 'mappers'
mapActorsSize.times do
  mapActor = actor do |message|
    reduceActor.sendOneWay wordCount.count message
  end
  mapActor.setDispatcher workStealer
  mapActors.push mapActor
end

The mappers delegate the actual work to an immutable WordCount class. The important part is the one that sets the same dispatcher on the actors. More on how this work on Akka’s documentation.

mapActor = mapActors.first
producer = actor do |message|
  for line in input
    mapActor.sendOneWay line
  end
end

allActors = [reduceActor, producer] + mapActors
allActors.each do |a|
  a.start
end
start = Time.now
producer.sendOneWay :start

These lines define the producer, start all actors, set the start time, and send the producer actor a message, which begins the map-reduce chain. It is important to note that the program’s main thread finishes on the last line. This example shows how It is possible to make it wait for the result and then resume the main thread (it requires using the other types of message sending methods, thus I’ll not cover it in detail).

Results: The sequential version runs on my machine (which has 2 cores) in about 4 seconds. This one with map-reduce actors take about 3 seconds, which yields a 25% improvement[6].

Conclusion

This post showed how it is easy to use Akka actors with JRuby and that they can easily enable thread-safe and easy to reason multicore programming. The Akka project has many other tools to help with distributed/parallel programming, such as remote actors, software transactional memory, and integrations with all sorts of persistence/queue systems. This post barely scratches the surface.

All the code on this blog post can be found on github, where all dependencies are easily available, and instructions on how to easily run the code. Give it a try, and see if you agree (or not) with others that writing parallel code can be much easier and fun.

Footnotes

[1] As Daniel Spiewak showed it on his Integrating Scala into JRuby post.

[2] Incidentally Akka was started by Jonas Bonér, who is also one of the creators of the java AOP tool AspectWerkz, which is included in Aspectj nowadays.

[3] From a interview with Joe Armstrong, the creator of Erlang:

Actually it’s a kind of 180 degree turn because I wrote a blog article that said “Why object-oriented programming is silly” or “Why it sucks”. I wrote that years ago and I sort of believed that for years. Then, my thesis supervisor, Seif Haridi, stopped me one day and he said “You’re wrong! Erlang is object oriented!”

[4] In general, you need to create a UntypedActorFactory to pass arguments to the constructor, which we already do (implicitly, using JRuby’s closure to interface coercion), but even then Ruby’s Actors will not work. This could be worked around by changing the Actors module and invoking another hook method that is not initialize.

[5] Thanks to the Akka committers Viktor Klang and Peter Veentjer for the hint.

[6] This is not a real benchmark. Making a real one requires a lot more attention to details like jvm warm-up, jiting from ruby to java, jiting on java bytecode, and so on.

Tagged with: , , , ,

RubyUnderscore: A bit of Arc and Scala in Ruby

Posted in Languages, Software Development by Daniel Ribeiro on October 31, 2010

A few months ago I’ve mentioned one thing that bothered me in ruby was No way to create simple blocks”. This is in contrast to other languages, such as Scala, Clojure and Groovy’s underscore, percent and “it”, respectively, shortcut notations. There are other languages with equivalent mechanisms as well. Even newer languages like Coffeescript have considered adding it. As James Iry mentioned, such constructs are in fact related to delimited continuations.

However ruby has Syntax Tree manipulation (via Parse Tree gem). Using it I created the RubyUnderscore project, which brings this ruby, using the underscore symbol (just like Scala and Arc). With it, it is possible to refactor the following:

    classes.reject { |c| c.subclasses.include?(Enumerable) }
    dates.select { |d| d.greater_than(old_date) }
    collection.map { |x| x.invoke }

into:

    classes.reject _.subclasses.include? Enumerable
    dates.select _.greater_than old_date
    collection.map _.invoke

The last case can also use symbol to proc coercion (appending & to symbol):

     collection.map &:invoke

However, the proc coercion is not flexible enough to allow arguments or invoke a method chain. Which I think brings a small increase in readability and code quality, not to mention that by making closures easier to declare, it fosters them to be used more. I find this to be a good thing, specially when you start to refactor your loops into maps, selects, rejects, group_bys and reduces

This also highlights another issue I mentioned in Improving Ruby: that Syntax Tree manipulation is too important to be supported only on MRI, and not throughout the implementations, like JRuby and Rubinius. Python has this built in into its standard library (through ast module and inspect.getsource), and Lisp can also do syntax tree manipulation with its macro system. The importance of such capability was mentioned by Paul Graham (one of the creators of Arc, which is a dialect of Lisp):

Letting people rewrite your language is a good idea. You, as the language designer, can’t possibly anticipate all the things programmers are going to want to do with it. To the extent they can rewrite the language, you don’t have to.

However, syntax tree manipulation in ruby is not only unsupported in most implementations, but it is also poorly documented (even though PostRank’s founder Ilya Grigorik‘s post on the subject is a very good introduction) and a bit awkward to use: the visitor from sexp-processor gem embraces side effect (mutating all the tree nodes while processing them) and the tree nodes are just arrays, unlike Python’s modules where there is a class for every node type. It is important to note that if you are willing to pre-process your ruby code, you can use ruby2ruby to generate the equivalent and regular ruby code, which will work all over.

These techniques are expected to be fixed as more people realize the gains they bring, and these improvements find their way into YARV and eventually other ruby implementations. Ruby is a very nice, clean, productive and elegant language, and it would be shame if we stopped making it even better.

Tagged with: , , ,
Follow

Get every new post delivered to your Inbox.