Too Much Code

Not enough cohesion

Part 3: Building a Firewall Against Complexity

In the first two posts in this sequence, I argued a few key points:
  • A language must be usable by mediocre developers to achieve widespread use.
  • A language that prevents quick and dirty application development will not be widely adopted.
  • Widespread adoption is important so good programmers can use better languages in their day jobs.
  • Haskell shows important innovation in language design, but has major hurdles to widespread adoption.
Assuming the above, we have a big problem: how do we make the essential innovation in Haskell available to the broader industry? We need to first define the essence of Haskell’s contribution, and then look for ways to package that essence for widespread use.

I will use a simple model of software problems to define Haskell’s essential contribution. Complexity kills, but Haskell helps contain it. We were in trouble the moment systems got too big for a single human to understand, and we kept digging the hole deeper. This needs to stop.

So far the best answer to complexity has been good design abstraction. And there have been successes – I usually don’t think about the virtual memory system when writing code, for instance. But as Joel Spolsky pointed out, ”All non-trivial abstractions, to some degree, are leaky.” I don’t usually think about virtual memory, but if I have locality of reference problems and huge numbers of page faults, the abstraction leaks. Suddenly I have to think about everything my program does, everything its libraries do, all existing abstraction leaks, and the virtual memory system. Soon there isn’t room in my limited mind to keep track of everything. The result is almost inevitable: a system gets more complex, details are missed, and defects creep in.

To make matters worse, the difficulty of understanding a system grows exponentially with size. Every variable added must be tracked with every other variable. I see two possible solutions to this:
  1. Stop building such complex software.
  2. Find ways to build airtight firewalls around complexity. Stop the leaks.
History suggests option one is not likely. Great programmers can solve complex problems with simple code, but this is not the broader trend. Even if it was, this answer is unsatisfying; it’s really just avoiding the problem.

The second solution might have potential. Imagine if we had abstractions that didn’t leak, even if they could only be used in special conditions, or applied to certain problems. There will always be complexity, but by bounding it with airtight abstractions we limit the number of variables involved. Heck, if we do a great job, it’s possible even we humans will be able to understand the systems we build.

This is where we reach the essential contribution of Haskell (or any pure functional language): it builds a firewall around the rest of the system’s complexity. You only have to think about the part you’re working, and you can make that small enough to be understandable. Because functions can’t have side effects, you don’t have to worry about locking a resource or dealing with inconsistent state. Your universe is defined by a function’s arguments.

So far, so good…but we still need to package our firewall against complexity to be more easily adopted and used. Notice large parts of functional languages are not included in the essential contribution. For instance, a chunk of code could be written in any style, as long as its side effects are encapsulated. Here we break away from the pure functional model. If local destructive updates, statements, for loops, or other imperative constructs make coding easier, we can allow them. The key is not allowing these local decisions escape; our firewall against complexity must hold. A similar point came up in a recent discussion on Channel 9. Erik Meijer described it as “local impurity, but global purity”.

This is a key element for our hypothetical new language: allow developers to code in their style of choice, provided it does not compromise the system’s guarantees. The merits of different styles are open to debate, but surely no one style is the best for all programs. Ultimately, our hypothetical language might really be a set of languages, all keeping the same guarantee of isolation. The key is we can build a language with stronger guarantees than any commonly used language, but make it easy for most developers to adopt.

Can we live without state?
System state is the biggest source of complexity for imperative programs. It’s also the biggest reason imperative code is considered easier for quick and dirty solutions. You want to change state? Go for it. Just be sure you synchronize with other threads, and your state change is consistent with the rest of the system, and you leave the system in a consistent state in case of error, and…well, hope others changing the state follow the same rules.

All of these problems go away with functional languages. But I can’t shake the idea that a language restricting all access to state simply won’t get adopted. Haskell elegantly solves this solution with the Monad; you write code in isolation, and simply describe the side effects to be evaluated as needed. Sounds good, until you run into problems like the one described here. In that case, the developer discovered a deep function needed some additional piece of information, and had to refactor large amounts of code to get it there. This drawback may well be justified by the advantages of isolated coding. Then again, our goal is to get a better language adopted, and most businesses will reject one that makes writing quick and dirty code harder, even if there is a net gain.

I really hope I’m wrong about this one. (And I’m sure many will try to convince me that I am.) I suspect, though, that languages must make accessing state information easy to gain widespread adoption. Is there a way we can do this while holding on to our firewall against complexity? Maybe. There are some tempting ideas that might solve this. Possibilities include Software Transactional Memory (STM), or messaging-based models like Erlang. The answer to which is better must come from experimentation, although STM elegantly solves some problems others do not, as Tim Sweeny described (pdf). Assuming an adopted system must be able to easily change state, we can still use STM to attain many advantages of the functional style:
  • No race conditions or deadlock. STM can make code seem like it’s running in a single-threaded environment. This is our firewall against concurrency complexity.
  • No inconsistent state. If any error occurs, the transaction is simply rolled back. As a result, the state space of the system is dramatically reduced, and all possible states are logically consistent.
  • External side effects (like database updates) should happen in the same transaction as memory updates, again ensuring consistent state.
A new language could allow only special annotated functions to use STM, clearly indicating which functions are logically pure. This is analogous to implicitly passing a Haskell Monad to every pertinent function. Yes, we are sacrificing functional purity. However, we keep our firewall, and build a language that just might find a widespread audience. Also, we need not sacrifice other great features of functional languages to get there. For instance, a function may do destructive updates internally but always return an immutable object, lending itself to memoization.

In summary, we have two new requirements for our hypothetical language:
Developers should be able to choose their style of programming, but still have strong guarantees of isolation from the rest of the system.
The system should limit the number of variables the developer must track while writing code.
In my next post, I will add more requirements in this spirit, and hopefully draw these thoughts to a close.