Think back to some of your earliest programming days. Unless you first learned LISP or the like (bless your little heart), you probably learned on an imperative programming language like C, PASCAL or BASIC. I say you learned “on” a language, because this initial experience shapes our idea of what a computer can and can’t do and what a programming language is “supposed” to be like. One of the most basic assumptions that these common first languages is that everything in your program has a canonical name. A variable has a name, and when you want to refer to it, you call it by name. A method has a name, and to invoke it, you use the name (and pass any parameters). Likewise, classes, modules, packages all have names, which you define when you write the program and the compiler interprets. Something which you may not have been exposed to early on is methods without names — so-called “anonymous methods.”
A Name for Every Function (The View from C)
Let’s start out looking at a language that doesn’t support anonymous methods: C. In C, a function name is more or less identical to a label in assembly language, that is, it’s a mnemonic for a memory address to be jumped to from other code. You can pass function pointers in C, but these are just pointers to that same address that you originally defined the function at. You can’t create a new function at runtime in C and assign it to a function pointer. So, if I want to do a dynamic sort routine in C, I might do something like this:
#include
void BubbleSort(int *numbers, int array_size, int(*comparer)(int, int))
{
int i, j, temp;
for (i = (array_size - 1); i >= 0; i--) {
for (j = 1; j <= i; j++) {
if (comparer(numbers[j-1], numbers[j]) > 0) {
temp = numbers[j-1];
numbers[j-1] = numbers[j];
numbers[j] = temp;
}
}
}
}
int IntegerComparer(int a, int b)
{
return a - b;
}
int ReverseIntegerComparer(int a, int b)
{
return b - a;
}
int main(int argc, char** argv)
{
int nums[] = {3, 4, 21, 17, 100 };
BubbleSort(nums, 5, IntegerComparer);
printf("Regular order:\n");
int i;
for (i = 0; i < 5; i++) {
printf("%d\n", nums[i]);
}
BubbleSort(nums, 5, ReverseIntegerComparer);
printf("Reverse order:\n");
for (i = 0; i < 5; i++) {
printf("%d\n", nums[i]);
}
}
Pretty verbose, I know, but we’ll use this as the basis for all of our future examples. The point is, I can implement a “generic” bubble sort in C that takes a function pointer as an argument that I can use to compare two integers. If I wanted to, I can sort all the odd numbers to the beginning of the list, or I can sort all the primes to the end, etc. just by passing a different comparer.
However, these comparer functions have to be defined separately in the module and I refer to them by their canonical name (the name I used when I defined them) when I pass them as pointers. Looking at the assembly code generated by gcc, you can see that these canonical names are directly translated into assembly labels:
.globl _IntegerComparer
.def _IntegerComparer; .scl 2; .type 32; .endef
_IntegerComparer:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %edx
movl 8(%ebp), %eax
subl %edx, %eax
popl %ebp
ret
.globl _ReverseIntegerComparer
.def _ReverseIntegerComparer; .scl 2; .type 32; .endef
_ReverseIntegerComparer:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
subl %edx, %eax
popl %ebp
ret
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
And, when I call the BubbleSort function with my function pointer, I’m just passing the addresses of these labels:
movl $_ReverseIntegerComparer, 8(%esp)
movl $5, 4(%esp)
leal -40(%ebp), %eax
movl %eax, (%esp)
call _BubbleSort
movl $LC2, (%esp)
call _printf
movl $0, -44(%ebp)
This is the closest thing to an anonymous method in a language like C, and, as you’ll see, it’s pretty far from that target.
Introducing “Anonymous”
At this point, I’ll switch from C into C# 3.0, a language that allows for anonymous methods. Truly anonymous methods were introduced in C# 2.0 with a much more verbose syntax; I’ll use the simpler “lambda” syntax introduced in 3.0. The key conceptual difference between languages that allow for anonymous methods and languages that don’t is that all languages with anonymous methods have first-class functions. In brief, this means that a function definition is an expression and has a value at runtime which can be passed around, used as a function parameter or returned as a return value.
In contrast, in C, our function definition was a statement to the compiler which told it, in essence, to create a label and handle our parameters and return value according to some standard. The definition had no value at runtime. The only thing we had access to at runtime was the address of the label the compiler created for us.
In C#, the following is completely valid:
public void Foo()
{
Func sorter = (a, b) => a - b;
Console.WriteLine(sorter(1, 2));
}
This prints out -1. The right-hand side of that assignment to “sorter” is what is lovingly referred to as “lambda” expression by the C# team and lovers of the lambda calculus the world over, and it’s a way of defining an anonymous function at runtime. In this case we defined a function that takes two integer parameters called “a” and “b” and returns an integer a - b. Then, we called this function with a = 1 and b = 2, which returned -1.
You might be saying, wait a minute, that function does have a name: “sorter”! Well, let’s change things up a bit:
public Func GetSorter()
{
return (a, b) => a - b;
}
This is still perfectly valid, and now the function doesn’t have any name that we can refer to in this block of code, and it returns a function. So, you might now well ask, what does it mean to return a function? Is it an address to jump to? Is it some kind of intermediate data structure defined by the runtime? Is it text that will be interpreted by some sort of runtime interpreter? It could be any of these, although for C# it is more or less a combination of the first two.
Generally speaking, in a compiled language such as C#, anonymous functions do actually have a canonical name (and an associated assembly label), but this name is auto-generated by the compiler. What you’re passing around when you define a local variable to point at an anonymous method, or you return one from another function is an intermediate data structure that, at some level, contains a pointer to this label, and, therefore, the compiled body of your function. In this structure, the compiler also stores a reference to the lexical scope in which the anonymous function was defined. This combination of function-body and lexical scope is called a “closure” and is of paramount importance in functional programming (more on this in a future post).
Implementations will obviously vary, but these two pieces of data are essential to any proper implementation of anonymous methods.
So, using anonymous methods, we can redefine our dynamic sorter as follows in C#:
static void BubbleSort(int[] numbers, Func comparer)
{
int i, j, temp;
for (i = (numbers.Length - 1); i >= 0; i--)
{
for (j = 1; j <= i; j++)
{
if (comparer(numbers[j - 1], numbers[j]) > 0)
{
temp = numbers[j - 1];
numbers[j - 1] = numbers[j];
numbers[j] = temp;
}
}
}
}
static void Main(string[] args)
{
int[] nums = {3, 4, 21, 17, 100 };
BubbleSort(nums, (a,b) => a - b);
Console.WriteLine("Regular order:");
int i;
for (i = 0; i < 5; i++)
{
Console.WriteLine(nums[i]);
}
BubbleSort(nums, (a, b) => b - a);
Console.WriteLine("Reverse order:");
for (i = 0; i < 5; i++)
{
Console.WriteLine(nums[i]);
}
}
So, it’s shorter and sweeter, right? In fact, there are a couple more reasons to use anonymous methods whenever you can:
- They keep your namespace clean. Your namespace isn’t polluted with lots of one-off functions and helpers. You don’t have to worry about if you already have a function defined called “SorterHelper,” or the like.
- Your function bodies are in the code where they are used. This makes it easier to see what’s going on in your code.’
But I haven’t shown you anything here yet that you can’t do with C, I’ve just shown you an easier and more terse way of expressing things. If you know anything about functional programming, you’re probably screaming at me right now for glossing over the most important fact about anonymous functions:
- The have access to the lexical scope in which they are defined.
I’ve been intentionally avoiding this topic, because it’s big. Really big. So big, in fact, that it needs its own post. Check out part 2 of this post, dealing with closures and other big functional programming ideas!
November 18th, 2008 | Tags: functional programming | Comments (3)