rake gems:unpack

Ever run the “gems:unpack” Rake task in a Rails app, and wondered why one or more of your gems was silently skipped?

Any gem that is loaded in your Rakefile (e.g. metric_fu, vlad, etc) is considered to be a ‘framework gem’ by Rails, and such gems are not unpacked. Given that the vendor/gems directory is not yet in the load path when the Rakefile is loading, this is probably a good idea.

In other words, if you have a library that provides Rake tasks, or is otherwise necessary for your .rake files to be valid, don’t expect “config.gem” and friends to handle it for you.

GitHub is pretty awesome

Here are some of the things the GitHub team totally nailed:

  • Projects do not get a homepage. At SourceForge and its clones, you get this lame default page that you have to maintain, even if you don’t want it. At this point in history, I think we can assume that projects will already have a homepage, no matter how small.
  • Bi-directional connections between parent projects and forks. This lowers the barrier to entry for a typical open source patch tenfold. A simple idea that seems obvious in retrospect.
  • Having user URLs so close to the toplevel makes it trivial to remember the path to a particular project without searching. Imagine github.com/users/wilson/projects/213?name=foo and give thanks for what we have instead.
  • No need to fear github turning ‘evil’ or shutting off all the servers. Every clone of the project is another complete backup, unlike the nightmare of Subversion.
  • A ‘forker’ can trivially request an upstream developer’s attention via a pull request.
  • Optimized for storing and displaying source code, rather than downloading binaries.

There is a little bit of pain here during the ‘transition period’ as github takes over the open source world; existing projects will have to either enforce a single technique, or start checking two different places for patches. Presumably better Trac / Lighthouse integration with GitHub is forthcoming, however.

I recently had the opportunity to spin off a fork of RSpec in order to work on better Rubinius support; thus far the process has been painless for both me and the RSpec team. Nice work, GitHub.

Calling in the dark

Some of you may have overheard me on the first day of RailsConf bitching about some crazy piece of code in RSpec that was totally broken in Rubinius. If not, well, here is that code:


eval("caller", registration_binding_block.binding)

Wow, what is going on here? After tracing through the RSpec code, I learned that ‘registration_binding_block’ was a block being turned into a Proc via block_pass.For example:


def describe(&block)
  @registration_binding_block = block
  yield
end

So now we know what’s being returned by this method, but what about the rest of that line? Some people may not know that eval can take a Proc as a second argument. For the purposes of eval, a Proc and that Proc’s binding produce the same result. So we should also be able to say:


eval("caller", registration_binding_block)

What does it even mean to ask for “caller” in a Proc’s binding? What are we asking for? The Proc in question may not even have executed yet. It could just be sitting around, waiting to be called.

It turns out that RSpec wants to know the stack trace, not from this call to ‘eval’, but back from where the “registration_binding_block” was originally written; for example, one of the user’s spec files. RSpec uses this fairly extreme meta-programming trick in order to match your specs back to the file and line they were written on.

After the massive headache of understanding what the fix was, it turned out to be fairly easy to hack into Rubinius, and most of the RSpec specs now pass.

For everyone running benchmarks on unfinished Ruby implementations.. good luck keeping your speed when you are done with the really fun Ruby features.

Anyone out there know an easier way to ask a Proc what its “definition trace” is than what RSpec uses? I can’t think of one myself yet..

Merb on Rubinius

Now that we have working OpenSSL support, we can run Merb:

fastness ➞ merbinius -a webrick  
~ Loaded DEVELOPMENT Environment...
~ Compiling routes...
~ Using 'share-nothing' cookie sessions (4kb limit per client)
~ Using Webrick adapter
~ WEBrick 1.3.1
~ ruby 1.8.6 (05/07/2008) [i686-apple-darwin9.2.2]
~ TCPServer.new(0.0.0.0, 4000) ~ Rack::Handler::WEBrick is mounted on /. ~ WEBrick::HTTPServer#start: pid=61939 port=4000 ~ accept: 127.0.0.1:59883 ~ Rack::Handler::WEBrick is invoked. ~ Request: ~ Routed to: {:action=>"index", :controller=>"hello"} ~ Params: {"action"=>"index", "controller"=>"hello"} ~ {:after_filters_time=>1.8e-05, :before_filters_time=>3.1e-05, :dispatch_time=>0.069112, :action_time=>0.068106}

Implementing define_method

A walkthrough of how ‘define_method’ is implemented in Rubinius.

A walkthrough of how ‘define_method’ is implemented in Rubinius

This week I participated in the first Rubinius sprint in Denver, CO. A good time was had by all, and quite a lot of useful code was cranked out.

Brian has covered the basics rather well at the above link, so I won’t bother to repeat them here, other than to thank Sun for sponsoring the travel expenses. Sun is showing a lot of class in the Ruby community by paying attention to more than their own JRuby project.

Prior to the sprint, one Ruby feature that was horribly broken in Rubinius was Module#define_method. In its most commonly-encountered form, this feature takes a block and ‘promotes’ it into an actual method. While it has some unfortunate limitations in ruby 1.8, this is still a very mainstream feature, and it needs to work.I won’t torture you by showing you the code as it existed prior to the sprint, but basically it:

  1. Made a copy of the method object that called define_method
  2. Surgically removed the compiled code from said method
  3. Injected the bytecodes representing the block into the method
  4. Placed the newly-built method into the MethodTable of the appropriate class

While it is a testament to the incredible dynamism of Rubinius that this approach was even possible, it turns out that define_method has some unique requirements that weren’t obvious to me at first.Let’s say we have a class like this:

class SomeClass
  def to_s
    "someclass"
  end
end

Now we want to define a new method on it called ‘some_method’. Generally, define_method is used when you need to create a method that ‘encloses’ variables that are available to the caller, just like a block or a Proc.

class SomeClass
  x = 5
  define_method(:some_method) { x }
end

In this case, we’ve got ‘x’, a local variable, that we want to be able to access when we call the newly-defined method.

Simply doing “def some_method” would prevent us from accessing this variable.

SomeClass.new.some_method # => 5

As expected, this return ‘5’.

So far this is looking pretty straightforward. We can access the calling scope at runtime, when the defined method is invoked.

How about this, though?

class SomeClass
  define_method(:some_method) { self }
end

If, based on the earlier code, you expected ‘self’ to be the caller of define_method, you would be wrong.

p SomeClass.new.some_method # => "someclass"

If you were, don’t feel bad. Evan and I guessed wrong too.
If you weren’t fooled, you are smart and should come contribute to Rubinius.

As you can see, ‘self’ is what it would be if you had defined the method normally. Calling the new method seems to behave more like ‘instance_eval’ than ‘call’. Even more importantly, self is known only when the method is called, not when it is defined. Each invocation might give a different result, just like a normal method.

To implement this, I added a new Rubinius ‘primitive’ that implements what we are calling a ‘Delegated Method’. A delegated method is a placeholder in the method table that, when called, executes the necessary code in the correct context.

t1 = NTH_FIELD(mo, 4); // Which method to call
t2 = NTH_FIELD(mo, 5); // What are we calling this method on
t3 = NTH_FIELD(mo, 6); // Do we need 'self' to be available?
if(Qtrue == t3) {
  num_args++; // Method expects 'self' as an argument
} else {
  stack_pop(); // Discard self
}
cpu_send_method2(state, c, t2, t1, num_args, Qnil); // Invoke it

This code is fairly typical of the parts of Rubinius that are implemented in C.
In other words, it is pretty easy to understand at first glance, and very short.

This approach suddenly makes implementing define_method straightforward:

def define_method(name, meth = nil, &prc)
  meth ||= prc

  if meth.kind_of?(Proc)
    block_env = meth.block
    cm = DelegatedMethod.build(:call_on_instance, block_env, true)
  elsif meth.kind_of?(Method)
    cm = DelegatedMethod.build(:call, meth, false)
  elsif meth.kind_of?(UnboundMethod)
    cm = DelegatedMethod.build(:call_on_instance, meth, true)
  else
    raise TypeError, "wrong argument type #{meth.class} (expected Proc/Method)"
  end

  self.method_table[name.to_sym] = cm
  VM.reset_method_cache(name.to_sym)
  meth
end

In the case of our example code, execution will follow the “if kind_of?(Proc)” path. define_method can also take a Method object in ruby 1.8, hence the other code branches.

This code:

  1. Fetches the block that was given to define_method
  2. Makes a new DelegatedMethod that wraps up the block
  3. Adds the new method to the method table
  4. Resets the method cache so that any older versions of this method are discarded

A nice side-effect of this approach is that the newly-defined method is almost as fast as a normal one, bypassing the extremely large slowdown experienced in ruby 1.8.

Implementing a Ruby VM in Ruby turns out to feel pretty natural. Next up on the chopping block, eval.