Implementing define_method

A walkthrough of how ‘define_method’ is implemented in Rubinius.

A walkthrough of how ‘define_method’ is implemented in Rubinius

This week I participated in the first Rubinius sprint in Denver, CO. A good time was had by all, and quite a lot of useful code was cranked out.

Brian has covered the basics rather well at the above link, so I won’t bother to repeat them here, other than to thank Sun for sponsoring the travel expenses. Sun is showing a lot of class in the Ruby community by paying attention to more than their own JRuby project.

Prior to the sprint, one Ruby feature that was horribly broken in Rubinius was Module#define_method. In its most commonly-encountered form, this feature takes a block and ‘promotes’ it into an actual method. While it has some unfortunate limitations in ruby 1.8, this is still a very mainstream feature, and it needs to work.I won’t torture you by showing you the code as it existed prior to the sprint, but basically it:

  1. Made a copy of the method object that called define_method
  2. Surgically removed the compiled code from said method
  3. Injected the bytecodes representing the block into the method
  4. Placed the newly-built method into the MethodTable of the appropriate class

While it is a testament to the incredible dynamism of Rubinius that this approach was even possible, it turns out that define_method has some unique requirements that weren’t obvious to me at first.Let’s say we have a class like this:

class SomeClass
  def to_s
    "someclass"
  end
end

Now we want to define a new method on it called ‘some_method’. Generally, define_method is used when you need to create a method that ‘encloses’ variables that are available to the caller, just like a block or a Proc.

class SomeClass
  x = 5
  define_method(:some_method) { x }
end

In this case, we’ve got ‘x’, a local variable, that we want to be able to access when we call the newly-defined method.

Simply doing “def some_method” would prevent us from accessing this variable.

SomeClass.new.some_method # => 5

As expected, this return ‘5’.

So far this is looking pretty straightforward. We can access the calling scope at runtime, when the defined method is invoked.

How about this, though?

class SomeClass
  define_method(:some_method) { self }
end

If, based on the earlier code, you expected ‘self’ to be the caller of define_method, you would be wrong.

p SomeClass.new.some_method # => "someclass"

If you were, don’t feel bad. Evan and I guessed wrong too.
If you weren’t fooled, you are smart and should come contribute to Rubinius.

As you can see, ‘self’ is what it would be if you had defined the method normally. Calling the new method seems to behave more like ‘instance_eval’ than ‘call’. Even more importantly, self is known only when the method is called, not when it is defined. Each invocation might give a different result, just like a normal method.

To implement this, I added a new Rubinius ‘primitive’ that implements what we are calling a ‘Delegated Method’. A delegated method is a placeholder in the method table that, when called, executes the necessary code in the correct context.

t1 = NTH_FIELD(mo, 4); // Which method to call
t2 = NTH_FIELD(mo, 5); // What are we calling this method on
t3 = NTH_FIELD(mo, 6); // Do we need 'self' to be available?
if(Qtrue == t3) {
  num_args++; // Method expects 'self' as an argument
} else {
  stack_pop(); // Discard self
}
cpu_send_method2(state, c, t2, t1, num_args, Qnil); // Invoke it

This code is fairly typical of the parts of Rubinius that are implemented in C.
In other words, it is pretty easy to understand at first glance, and very short.

This approach suddenly makes implementing define_method straightforward:

def define_method(name, meth = nil, &prc)
  meth ||= prc

  if meth.kind_of?(Proc)
    block_env = meth.block
    cm = DelegatedMethod.build(:call_on_instance, block_env, true)
  elsif meth.kind_of?(Method)
    cm = DelegatedMethod.build(:call, meth, false)
  elsif meth.kind_of?(UnboundMethod)
    cm = DelegatedMethod.build(:call_on_instance, meth, true)
  else
    raise TypeError, "wrong argument type #{meth.class} (expected Proc/Method)"
  end

  self.method_table[name.to_sym] = cm
  VM.reset_method_cache(name.to_sym)
  meth
end

In the case of our example code, execution will follow the “if kind_of?(Proc)” path. define_method can also take a Method object in ruby 1.8, hence the other code branches.

This code:

  1. Fetches the block that was given to define_method
  2. Makes a new DelegatedMethod that wraps up the block
  3. Adds the new method to the method table
  4. Resets the method cache so that any older versions of this method are discarded

A nice side-effect of this approach is that the newly-defined method is almost as fast as a normal one, bypassing the extremely large slowdown experienced in ruby 1.8.

Implementing a Ruby VM in Ruby turns out to feel pretty natural. Next up on the chopping block, eval.

Lame Code Considered Harmful

In this chapter, our intrepid hero writes slightly-more-tolerable code.

This is a follow-up to my earlier article Making a mockery of ActiveRecord. That seemed to prompt a fair amount of useful discussion. One of the most interesting can be found here, on James Mead’s blog. He is more polite about it than this, but to summarize: “Your tests would be easier to write if your code didn’t suck.”

One way to think about testing (particularly in a message-oriented language like Ruby) is as a weak mathematical proof. Don’t take that analogy too far, since we aren’t really formally proving anything. Consult a medical professional before using my blog, or any other drug.

Suppose we have a method called ‘a’, and it invokes the methods ‘b’, ‘c’, and ‘d’ to get its job done. If we have already written tests for ‘b’, ‘c’, and ‘d’ showing that they behave correctly when given a certain set of inputs, all that is left is to show that ‘a’ gives them that input. This is the concept that lets us use mocks, and still be comfortable that our code is probably correct. We don’t need to re-test method ‘b’ every time it is called. We already know that it works.

What does this have to do with my previous article? James indirectly pointed out to me that I hadn’t written a Composed Method, and I was making life hard for myself.
Here’s what the code looked like when I wrote the article:

 1 def generate_email_messages
 2   people = self.recipients
 3   Message.transaction do
 4     people.each do |person|
 5       unless person.campaigns.include?(self.campaign)
 6         person.subscriptions.create :campaign => self
 7       end
 8       em = self.email_messages.create :person => person, :message => self, 
 9         :direction => 'mt', :sent_at => Time.now.utc, :body => self.body
10       unless em.valid?
11         raise RuntimeError.new("Unable to ..blah.. for #{person.email_address}.")
12       end
13     end
14   end
15   people.size
16 end

This isn’t too bad, but it definitely has some flaws. On line 5, I am reaching out ‘through’ Person, and interrogating it to decide whether I need to create a Subscription. Clearly, the Person model should be deciding whether or not it needs a subscription.

The same thing happens again on line 10. I reach through the email_messages association, and then raise an error if the creation failed. The EmailMessage model should contain the knowledge of what constitutes an error. This is particularly true since this example is greatly simplified, and the real model uses acts_as_state_machine with conditional validations. As your models grow in complexity, the importance of having the functionality in the right place rises.

Here’s what the code looks like now.

 1 def generate_email_messages
 2   people = self.recipients
 3   Message.transaction do
 4     people.each do |person|
 5       person.subscribe_to(self.campaign)
 6       EmailMessage.generate_message_for(self, person)
 7     end
 8   end
 9   people.size
10 end

There are new methods on Person and EmailMessage that encapsulate the necessary steps. At first glance, those methods look trivial. Neither is longer than three lines. However, they can now be tested in isolation, and work the same way with both existing and unsaved objects. ActiveRecord’s association proxies (email_messages, campaigns, etc) are powerful tools, but they can also distract you from the actual goal, which is always to write readable programs with as few defects as possible.

Making a mockery of ActiveRecord

This is long, so grab a drink

I had a good RSpec experience today (well, it is a weekday) that I thought was worth sharing.

If this is your first time reading about mocks, or if you’re just rusty, you may want to hit up this nice collection of info: Ruby Mock ObjectsThe setup:

  • An email Campaign.
  • A Message that represents what is arguably a template.
  • People and semi-dynamic Lists of people.
  • People can be connected to a Campaign directly via a Subscription (an opt-in), or they can simply be on a List. 
    (We won’t see the reasoning for that in this simplified example, but just pretend it makes sense.)

When Messages are ready to be delivered, they are told to generate EmailMessage records.
EmailMessages are connected directly to People, and represent what will eventually be sent as real emails.

If this seems involved, it’s because you haven’t seen the customer requirements, so stop complaining.

The traditional Rails solution to testing this would be to make fixtures for all of the above models, and any join models.
You would then load and roll back those fixtures numerous times, repetitively testing what you already know; namely, that your database service has been started properly.

This has a number of drawbacks, not least of which is that you are probably referring back to your fixtures as you write your tests, to make sure you get the names right, and that you pick the appropriate fixture for each scenario.

Enter Mocks.

Even in an alternate world where mocks were no more expressive than fixtures, they would be worth using simply to have your ‘setup’ information right there on screen as you write your specs. Luckily, they are a lot cooler than that.
I’ll take you through a tour of the various steps along the way to specifying the behavior of the EmailMessage generation process, ‘mock-first’.

Round One

Here is the first cut at the code.

 1 context "A Message being generated for delivery" do
 2   setup do
 3     @campaign = mock("campaign")
 4     @campaign.stub!(:is_a?).and_return(true)
 5     @campaign.stub!(:new_record?).and_return(false)
 6     @campaign.stub!(:id).and_return(rand(1000))
 7 
 8     @person = mock("person")
 9     @person.stub!(:is_a?).and_return(true)
10     @person.stub!(:new_record?).and_return(false)
11     @person.stub!(:id).and_return(rand(1000))
12 
13     @message = Message.new :name => "Hello, world", :campaign => @campaign
14     @list = mock("list")
15     @subscriptions = mock("subscriptions")
16   end
17 
18   specify "should create EmailMessages but not Subscriptions when include_subscribers is false" do
19     @message.should_receive(:lists).and_return([@list])
20     @list.should_receive(:people).and_return([@person])
21     @message.should_receive(:campaign).and_return(@campaign)
22     @person.should_receive(:campaigns).and_return([@campaign])
23     @message.generate.should == 1
24     @message.should_have(1).email_messages
25   end
26 
27   specify "should create EmailMessages and Subscriptions when include_subscribers is true" do
28     @message.include_subscribers = true
29     @message.should_receive(:lists).and_return([@list])
30     @message.should_receive(:campaign).twice.and_return(@campaign)
31     @list.should_receive(:people).and_return([@person])
32     @person.should_receive(:campaigns).and_return([])
33 
34     @person.should_receive(:subscriptions).and_return(@subscriptions)
35     @subscriptions.should_receive(:create).and_return(nil)
36     @campaign.should_receive(:people).and_return([])
37     @message.should_have(1).email_messages
38   end
39 end
40 

This is pretty decent, really. Probably fewer lines of code than the equivalent fixtures, and faster because we’re not hitting the database.
On the other hand, there’s ‘setup’ everywhere, and it’s hard to see what the actual expectations are.
@message.should_have(1).email_messages is like a needle in a haystack.

Also, I’ve got everything in a single ‘context’, even though quite literally there are two different ones in play here.
I simply didn’t want to have to repeat the ‘setup’ block in a second context, and got lazy.

Let’s fix that first, since maybe it’s at the heart of the problem.

Round Two

 1 module MessageSetup
 2   def setup_message
 3     @campaign = mock("campaign")
 4     @campaign.stub!(:is_a?).and_return(true)
 5     @campaign.stub!(:new_record?).and_return(false)
 6     @campaign.stub!(:id).and_return(rand(1000))
 7 
 8     @person = mock("person")
 9     @person.stub!(:is_a?).and_return(true)
10     @person.stub!(:new_record?).and_return(false)
11     @person.stub!(:id).and_return(rand(1000))
12 
13     @message = Message.new :name => "Hello, world", :campaign => @campaign
14     @list = mock("list")
15     @subscriptions = mock("subscriptions")
16   end
17 end
18 
19 context "A Message being generated without include_subscribers" do
20   include MessageSetup
21   setup { setup_message }
22 
23   specify "should create EmailMessages but not Subscriptions" do
24     @message.should_receive(:lists).and_return([@list])
25     @list.should_receive(:people).and_return([@person])
26     @message.should_receive(:campaign).and_return(@campaign)
27     @person.should_receive(:campaigns).and_return([@campaign])
28     @message.generate.should == 1
29     @message.should_have(1).email_messages
30   end
31 end
32 
33 context "A Message being generated with include_subscribers enabled" do
34   include MessageSetup
35   setup { setup_message }
36 
37   specify "should create Subscriptions and EmailMessages" do
38     @message.include_subscribers = true
39     @message.should_receive(:lists).and_return([@list])
40     @message.should_receive(:campaign).twice.and_return(@campaign)
41     @list.should_receive(:people).and_return([@person])
42     @person.should_receive(:campaigns).and_return([])
43 
44     @person.should_receive(:subscriptions).and_return(@subscriptions)
45     @subscriptions.should_receive(:create).and_return(nil)
46     @campaign.should_receive(:people).and_return([])
47     @message.generate.should == 1
48     @message.should_have(1).email_messages
49   end
50 end
51 

Here I’ve taken the ‘setup’ code, moved it into a module, and mixed it into the two contexts.
It looks a little showy, but at least it keep the setup code away from the specs.
Also, now that there are two separate contexts, the specs can be a little more readable.

All that being said, I’m still not happy. The ugly stubs required to fake out ActiveRecord are distracting, and appear in every context that needs to say:
@real_activerecord_model.some_association = some_mock

Under the hood, ActiveRecord checks to make sure the association is being handed a compatible object. We have to stub that to make this all work. Why are we bothering again? Test isolation. Future model changes that don’t affect this code shouldn’t break these specs. If you are dealing with six or seven model classes per spec, that is ridiculously easy to do.

What can we do about this boilerplate? Well, let’s make a helper.

Round Three

 1 module MessageSetup
 2   def setup_message
 3     mock_model :campaign
 4     mock_model :person do |m|
 5       m.should_receive(:campaigns).and_return([@campaign])
 6     end
 7     mock_model :list do |m|
 8       m.should_receive(:people).and_return([@person])
 9     end
10     @message = Message.new :name => "Hello, world", :campaign => @campaign
11   end
12 end
13 
14 context "A Message being generated without include_subscribers" do
15   include MessageSetup
16   setup { setup_message }
17 
18   specify "should create EmailMessages but not Subscriptions" do
19     @message.should_receive(:lists).and_return([@list])
20     @message.should_receive(:campaign).and_return(@campaign)
21     @message.generate.should == 1
22     @message.should_have(1).email_messages
23   end
24 end
25 
26 context "A Message being generated with include_subscribers enabled" do
27   include MessageSetup
28   setup do
29     setup_message
30     @subscriptions = mock("subscriptions")
31     @campaign.should_receive(:people).and_return([])
32     @person.should_receive(:subscriptions).and_return(@subscriptions)
33     @message.include_subscribers = true
34   end
35 
36   specify "should create Subscriptions and EmailMessages" do
37     @message.should_receive(:lists).and_return([@list])
38     @message.should_receive(:campaign).twice.and_return(@campaign)
39 
40     @subscriptions.should_receive(:create).and_return(nil)
41     @message.generate.should == 1
42     @message.should_have(1).email_messages
43   end
44 end
45 

Hey, that’s looking pretty good. “mock_model” says pretty directly what the mission is, and having it take a block avoids the need to repeat the name of the instance variable. 
In fact, there’s no longer a need to assign the mock return value to something, since, in typical ‘convention over configuration’ style, the mock automatically goes into an instance variable with the same name as the model.

How did we do? Let’s see what the “time” command has to say:

time drbspec spec/models/message_spec.rb

.............

Finished in 0.268692 seconds

13 specifications, 0 failures
drbspec spec/models/message_spec.rb  
0.08s user 
0.02s system 
11% cpu 
0.852 total

Under a second of wall-clock time isn’t half bad.

Here’s the helper code, which goes in the ‘EvalContext’ of spec_helper.rb.
I’ll probably be adding some more features before submitting it as a patch, but you’re welcome to it.

 1 def mock_model(name)
 2   name = name.to_s
 3   m = mock(name)
 4   instance_variable_set("@#{name}", m)
 5   m.stub!(:id).and_return(rand(10_000))
 6   m.stub!(:new_record?).and_return(false)
 7   klass = name.singularize.camelize
 8   m.send(:__mock_handler).instance_eval <<-CODE
 9           def @target.is_a?(other)
10             other == #{klass}
11           end
12           def @target.class
13             #{klass}
14           end
15   CODE
16   yield m if block_given?
17 end