Wednesday, February 8, 2012

Metaclasses, first class objects, and a lesson from SICP.

This week my SICP Study Group is looking at functions as first-class objects. This means, among other things, that functions can be passed as arguments and returned as values of other functions. A few members of my group feel that the material covered is too theoretical and has little real-world value. I'm always a bit bemused by this, because I use ideas from SICP nearly every day, especially when it comes to higher-order functions. So I thought I'd use an example from my day job to show how this section of SICP is particularly useful.

Part of my job includes writing data-entry applications so that other team members can interact with my statistical models. I write these applications in Python, using a library called web.py. In this library, we can assign to each web request URI a class to handle it.

What's a Class?


It's something we haven't seen before in our study group! I don't want to explain it in detail (you can check Wikipedia), but I hope it'll suffice to say this:

A class can be thought of as a way to capture, isolate, and reuse state between a group of functions. "State" is another concept we haven't seen in my group! We've only written stateless, functional programs. "State" here means data that exists independently of the functions, but is sometimes used or modified by the functions to perform its computation (exempli gratia, the balance of a bank account might be state between a group of functions that affect one's bank account). For instance, say we have a web server handling requests from the internet. Each request needs to perform a series of computations related to that request, but the data for each request, id est, the state of the request, should be kept separate from each other. This is exactly why web.py uses classes for requests. We can define what data or state our groups of functions need to share, and we can make copies of those groups (called "instances" of the class) that won't interfere with each other's state.

What's a Class look like?


Classes in Python have a name, some data members, and some functions. Classes to handle web requests with web.py generally look something like this:
class RequestHandler:
request_var1 = "data..."
request_var2 = "more data..."
def GET(self):
i = web.input()
#code to handle request...
return html_output

Web.py will look for a function GET inside the class when the web request comes with GET headers. Each time the application gets a web request, it creates a new instance of the class, id est, a fresh copy of the data and functions, and calls the new GET method with the new data.

In this specific example, the class is really just a special "wrapping" around the GET function so web.py can use it. It won't hurt the reader who is unfamiliar with classes to regard them as functions for the remainder of this post.

Now, classes to handle web request are often a lot more complex than that. In my case, I have specific requests that should respond with data that contains the current parameters of my statistical model, and the data should be formatted for easy entry into a web application (exempli gratia, in JSON). I have about two dozen of these requests that ask for different aspects of the same object. Hence if I had to write them all out, each one would look something like this:
class YetAnotherRequestHandler:
def GET(self):
# code to process incoming request data and check for errors
# 5-10 lines ***same for every request***.
...
# code that's specific to this request, 2-3 lines.
...
# code to transform output into JSON-ready format
# another 5-10 lines that are **the same for every request***
return json_text

I don't want to burden the reader with those 20 lines of code that are the same for each request (nor do I want to show my employer's source code or think of my own contrived example!) but I hope I can make the point clear: each of these two dozen request handlers, they share a significant amount of code. If I were to write them all out by hand, I'd have over 80% duplicated code. Boring. Hard to read. Messy to change all at once.

Higher-order functions...erm, Classes


What if, instead of writing two dozen classes that share the same code, we could somehow abstract the similarities and only write the difference? Say, we could write a series of functions that do the specific-to-request bit:
def processor_for_request1(args...)
# 2 lines of code here
return data

def processor_for_request2(args...)
# another 2 lines of code here
return data

# and so on...

And pass those functions into some machine that will turn them into classes? Well, we can! Luckily, in Python, functions and classes are first class citizens, so all the techniques in our section of SICP apply here. We can write a function that takes these processors as input and returns a class that uses them like so:

def request_class_creator(processing_function):
class Abstract_Request:
# code to process incoming request data and check for errors
# 5-10 lines ***same for every request***.
data = processing_function(args...)
# code to transform output into JSON-ready format
# another 5-10 lines that are **the same for every request***
return json_text

Request_Handler1 = request_class_creator(processor_for_request1)
Request_Handler2 = request_class_creator(processor_for_request2)
# ...and so on


In this way, we only write once the code that is different, we can reuse the components that work, and we can easily see what the important parts of our program are. Additionally, we can easily write separate unit tests for each processing function and for the Abstract_Request handler. We obviously aren't covering unit testing in the SICP study group, but it's an important part of writing good software nevertheless.

While the language and the example is different, this is the exact same technique described in Sec 1.3 of SICP. In fact, it's where I learned it from. Python users call this technique a metaclass. Python includes a few constructs to expand how one might use this method, but I think it's safe to say that the basic ideas of it are well-explained in SICP.

No comments:

Post a Comment