rake gems:unpack

Ever run the “gems: unpack” Rake task in a Rails app, and wondered why one or more of your gems was silently skipped?

Any gem that is loaded in your Rakefile (e.g. metric_fu, Vlad, etc) is considered to be a ‘framework gem’ by Rails, and such gems are not unpacked. Given that the vendor/gems directory is not yet in the load path when the Rakefile is loading, this is probably a good idea.

In other words, if you have a library that provides Rake tasks, or is otherwise necessary for your .rake files to be valid, don’t expect “config.gem” and friends to handle it for you.

Calling in the dark

Some of you may have overheard me on the first day of RailsConf bitching about some crazy piece of code in RSpec that was totally broken in Rubinius. If not, well, here is that code:


eval("caller", registration_binding_block.binding)

Wow, what is going on here? After tracing through the RSpec code, I learned that ‘registration_binding_block’ was a block being turned into a Proc via block_pass.For example:


def describe(&block)
  @registration_binding_block = block
  yield
end

So now we know what’s being returned by this method, but what about the rest of that line? Some people may not know that eval can take a Proc as a second argument. For the purposes of eval, a Proc and that Proc’s binding produce the same result. So we should also be able to say:


eval("caller", registration_binding_block)

What does it even mean to ask for “caller” in a Proc’s binding? What are we asking for? The Proc in question may not even have executed yet. It could just be sitting around, waiting to be called.

It turns out that RSpec wants to know the stack trace, not from this call to ‘eval’, but back from where the “registration_binding_block” was originally written; for example, one of the user’s spec files. RSpec uses this fairly extreme meta-programming trick in order to match your specs back to the file and line they were written on.

After the massive headache of understanding what the fix was, it turned out to be fairly easy to hack into Rubinius, and most of the RSpec specs now pass.

For everyone running benchmarks on unfinished Ruby implementations.. good luck keeping your speed when you are done with the really fun Ruby features.

Anyone out there know an easier way to ask a Proc what its “definition trace” is than what RSpec uses? I can’t think of one myself yet..

Merb on Rubinius

Now that we have working OpenSSL support, we can run Merb:

fastness ➞ merbinius -a webrick  
~ Loaded DEVELOPMENT Environment...
~ Compiling routes...
~ Using 'share-nothing' cookie sessions (4kb limit per client)
~ Using Webrick adapter
~ WEBrick 1.3.1
~ ruby 1.8.6 (05/07/2008) [i686-apple-darwin9.2.2]
~ TCPServer.new(0.0.0.0, 4000) ~ Rack::Handler::WEBrick is mounted on /. ~ WEBrick::HTTPServer#start: pid=61939 port=4000 ~ accept: 127.0.0.1:59883 ~ Rack::Handler::WEBrick is invoked. ~ Request: ~ Routed to: {:action=>"index", :controller=>"hello"} ~ Params: {"action"=>"index", "controller"=>"hello"} ~ {:after_filters_time=>1.8e-05, :before_filters_time=>3.1e-05, :dispatch_time=>0.069112, :action_time=>0.068106}

Implementing define_method

A walkthrough of how ‘define_method’ is implemented in Rubinius.

A walkthrough of how ‘define_method’ is implemented in Rubinius

This week I participated in the first Rubinius sprint in Denver, CO. A good time was had by all, and quite a lot of useful code was cranked out.

Brian has covered the basics rather well at the above link, so I won’t bother to repeat them here, other than to thank Sun for sponsoring the travel expenses. Sun is showing a lot of class in the Ruby community by paying attention to more than their own JRuby project.

Prior to the sprint, one Ruby feature that was horribly broken in Rubinius was Module#define_method. In its most commonly-encountered form, this feature takes a block and ‘promotes’ it into an actual method. While it has some unfortunate limitations in ruby 1.8, this is still a very mainstream feature, and it needs to work.I won’t torture you by showing you the code as it existed prior to the sprint, but basically it:

  1. Made a copy of the method object that called define_method
  2. Surgically removed the compiled code from said method
  3. Injected the bytecodes representing the block into the method
  4. Placed the newly-built method into the MethodTable of the appropriate class

While it is a testament to the incredible dynamism of Rubinius that this approach was even possible, it turns out that define_method has some unique requirements that weren’t obvious to me at first.Let’s say we have a class like this:

class SomeClass
  def to_s
    "someclass"
  end
end

Now we want to define a new method on it called ‘some_method’. Generally, define_method is used when you need to create a method that ‘encloses’ variables that are available to the caller, just like a block or a Proc.

class SomeClass
  x = 5
  define_method(:some_method) { x }
end

In this case, we’ve got ‘x’, a local variable, that we want to be able to access when we call the newly-defined method.

Simply doing “def some_method” would prevent us from accessing this variable.

SomeClass.new.some_method # => 5

As expected, this return ‘5’.

So far this is looking pretty straightforward. We can access the calling scope at runtime, when the defined method is invoked.

How about this, though?

class SomeClass
  define_method(:some_method) { self }
end

If, based on the earlier code, you expected ‘self’ to be the caller of define_method, you would be wrong.

p SomeClass.new.some_method # => "someclass"

If you were, don’t feel bad. Evan and I guessed wrong too.
If you weren’t fooled, you are smart and should come contribute to Rubinius.

As you can see, ‘self’ is what it would be if you had defined the method normally. Calling the new method seems to behave more like ‘instance_eval’ than ‘call’. Even more importantly, self is known only when the method is called, not when it is defined. Each invocation might give a different result, just like a normal method.

To implement this, I added a new Rubinius ‘primitive’ that implements what we are calling a ‘Delegated Method’. A delegated method is a placeholder in the method table that, when called, executes the necessary code in the correct context.

t1 = NTH_FIELD(mo, 4); // Which method to call
t2 = NTH_FIELD(mo, 5); // What are we calling this method on
t3 = NTH_FIELD(mo, 6); // Do we need 'self' to be available?
if(Qtrue == t3) {
  num_args++; // Method expects 'self' as an argument
} else {
  stack_pop(); // Discard self
}
cpu_send_method2(state, c, t2, t1, num_args, Qnil); // Invoke it

This code is fairly typical of the parts of Rubinius that are implemented in C.
In other words, it is pretty easy to understand at first glance, and very short.

This approach suddenly makes implementing define_method straightforward:

def define_method(name, meth = nil, &prc)
  meth ||= prc

  if meth.kind_of?(Proc)
    block_env = meth.block
    cm = DelegatedMethod.build(:call_on_instance, block_env, true)
  elsif meth.kind_of?(Method)
    cm = DelegatedMethod.build(:call, meth, false)
  elsif meth.kind_of?(UnboundMethod)
    cm = DelegatedMethod.build(:call_on_instance, meth, true)
  else
    raise TypeError, "wrong argument type #{meth.class} (expected Proc/Method)"
  end

  self.method_table[name.to_sym] = cm
  VM.reset_method_cache(name.to_sym)
  meth
end

In the case of our example code, execution will follow the “if kind_of?(Proc)” path. define_method can also take a Method object in ruby 1.8, hence the other code branches.

This code:

  1. Fetches the block that was given to define_method
  2. Makes a new DelegatedMethod that wraps up the block
  3. Adds the new method to the method table
  4. Resets the method cache so that any older versions of this method are discarded

A nice side-effect of this approach is that the newly-defined method is almost as fast as a normal one, bypassing the extremely large slowdown experienced in ruby 1.8.

Implementing a Ruby VM in Ruby turns out to feel pretty natural. Next up on the chopping block, eval.

Lame Code Considered Harmful

In this chapter, our intrepid hero writes slightly-more-tolerable code.

This is a follow-up to my earlier article Making a mockery of ActiveRecord. That seemed to prompt a fair amount of useful discussion. One of the most interesting can be found here, on James Mead’s blog. He is more polite about it than this, but to summarize: “Your tests would be easier to write if your code didn’t suck.”

One way to think about testing (particularly in a message-oriented language like Ruby) is as a weak mathematical proof. Don’t take that analogy too far, since we aren’t really formally proving anything. Consult a medical professional before using my blog, or any other drug.

Suppose we have a method called ‘a’, and it invokes the methods ‘b’, ‘c’, and ‘d’ to get its job done. If we have already written tests for ‘b’, ‘c’, and ‘d’ showing that they behave correctly when given a certain set of inputs, all that is left is to show that ‘a’ gives them that input. This is the concept that lets us use mocks, and still be comfortable that our code is probably correct. We don’t need to re-test method ‘b’ every time it is called. We already know that it works.

What does this have to do with my previous article? James indirectly pointed out to me that I hadn’t written a Composed Method, and I was making life hard for myself.
Here’s what the code looked like when I wrote the article:

 1 def generate_email_messages
 2   people = self.recipients
 3   Message.transaction do
 4     people.each do |person|
 5       unless person.campaigns.include?(self.campaign)
 6         person.subscriptions.create :campaign => self
 7       end
 8       em = self.email_messages.create :person => person, :message => self, 
 9         :direction => 'mt', :sent_at => Time.now.utc, :body => self.body
10       unless em.valid?
11         raise RuntimeError.new("Unable to ..blah.. for #{person.email_address}.")
12       end
13     end
14   end
15   people.size
16 end

This isn’t too bad, but it definitely has some flaws. On line 5, I am reaching out ‘through’ Person, and interrogating it to decide whether I need to create a Subscription. Clearly, the Person model should be deciding whether or not it needs a subscription.

The same thing happens again on line 10. I reach through the email_messages association, and then raise an error if the creation failed. The EmailMessage model should contain the knowledge of what constitutes an error. This is particularly true since this example is greatly simplified, and the real model uses acts_as_state_machine with conditional validations. As your models grow in complexity, the importance of having the functionality in the right place rises.

Here’s what the code looks like now.

 1 def generate_email_messages
 2   people = self.recipients
 3   Message.transaction do
 4     people.each do |person|
 5       person.subscribe_to(self.campaign)
 6       EmailMessage.generate_message_for(self, person)
 7     end
 8   end
 9   people.size
10 end

There are new methods on Person and EmailMessage that encapsulate the necessary steps. At first glance, those methods look trivial. Neither is longer than three lines. However, they can now be tested in isolation, and work the same way with both existing and unsaved objects. ActiveRecord’s association proxies (email_messages, campaigns, etc) are powerful tools, but they can also distract you from the actual goal, which is always to write readable programs with as few defects as possible.