Sunday, May 3, 2009

Closure: I Don't Think That Word Means What You Think It Means

It is curious that my weakest subject in school was English, and that I struggle so much with the miss use of certain words. Words really do matter. As words are misused, it starts to become commonplace which through a domino effect results in the complete lose of their meaning.

The word most in jeopardy today in the domain of programming is Closure.

WikiPedia has a good definition which states: a closure is a first-class function with free variables. The critical phrase is: free variables. I realize how subtle this is... however without the free variables it is just another function... and IMO it doesn't have to be a function that is just the common mechanism. Adding to the list of good examples is the docs on closures in Javascript.

Examples that are wrong or vague are commonplace. First Example: Groovy, which defines it merely as a code block... or a second example, how about Zdeněk Troníček's blog support Closures in Java which defines it as an anonymous function. A Google search will provide a fairly equal number of good and bad examples.

Defining Closure
The point to be made here is that if you pull the "free variables" out of the definition of a closure than all you have an anonymous function and the two words would be synonymous making one redundant to the other. Put another way... Closures are typically anonymous methods, but not all anonymous methods are closures.

Understanding Variable Scope
When defining a variable, our typical scopes are global, part of a class, parameter to a method, or local to a method. This would be complex if we tried to discuss this for all languages, so I will focus on Java. In Java there is no global scope, this is typically handled as a static member of a class. In this case, it lives in memory in a space called perm space. If the variable is part of a class, then the instance is part of heap memory for an instance of that class. If it is a passed parameter or a local member variable, then the storage is part of the stack frame pushed on the stack. As a reminder, as a method returns the stack frame for that method is popped off the stack and the scope has ended for local variables, references, etc. What is interesting about the proposition of a closure, is we want to be able to create a method (method 1) with some "variables" in the scope of another method (method 2). After the return of method 2, which means everything is out of scope, we still want to be able to pass method 1 to other parts of the program with the state of local variables maintaining state. Perhaps you can understand the issue now. If the scope is lost, where and how is the state of the anonymous method 1 being maintained? This is what closure is, and it is up to the language, compiler or run-time to figure that out.

I prefer to consider closures as another level of scope, as oppose to just another function.

Functional Example
It is easier IMO to see this with a functional language, so here is an example:
Function powerFunctionFactory (int power) {
int pwrFunction(int base) {
return pow(base, power);
}

return pwrFunction;
}

Function sqr = powerFunctionFactory (2);
Function cube = powerFunctionFactory (3));

sqr (3);
cube(3);
Looking at this example when the function sqr is defined by invoking powerFunctionFactory(2), the scope of the value of 2 should be lost through normal scoping rules, however due to the feature of closure it is not and this technique is possible for a number of languages. Not only must it maintain state, but it must do it for each instance of closure. Notice that the cube function has a power state all it's own.

In closing I just ran across this great post by Neal Gafter, which is titled the definition of Closures. I didn't get to it in time to reference his material, which looks great. Perhaps there will be a follow up.

If we can't agree on the nature of closures containing variables... we will never understand monads :)