Too Much Code

Not enough cohesion

What we can learn from databases

While not perfect, relational databases are among the most successful software ever used. Their behavior is easily understood even under high concurrency. Sets of changes can be composed with almost no effort, with strong guarantees of consistency. The question is: would this be true if we built our databases the same way we build applications?

Transactions and read consistency would be the first to go. Users must then cooperatively synchronize on some external monitor; any failure to do so can result in invalid data. Next, system state and logic become intermixed. Gone are simple ways to inspect the state of the system, or create a well-defined state space with known transitions. In short, if databases were written like applications, they become vulnerable to all of the bugs we see in applications.

Suppose we turn the question on its head, and ask what characteristics of the database we can apply to the rest of software? Consider the following advantages:
  • The system itself guarantees a consistent view of data under concurrency, eliminating many types of race conditions
  • State changes are composable; updates are committed atomically, ensuring the database is never in an invalid state.
  • The state space is understandable. Because updates are composed to an atomic state change, countless permutations of state are eliminated
The key is this: we can confidently inspect, reason about, and change the state space of a database. The complexity of large models is mitigated by allowing only for valid, composable transitions. Imagine if we had such guarantees when building software in general. We could understand the state of an application at any time, and ensure all changes are valid and consistent. Our system would be much more understandable and predictable.

In fact, much language research is focused on this area. Software Transactional Memory in languages like Haskell is the most visible. The question is how such progress will reach the mainstream. History suggests an evolutionary model. Languages that gain adoption tend to have a good deal in common with an established language, lowering the barrier to entry. Because of this, I have yet to see a language with the above characteristics that I think will achieve widespread adoption. Hopefully that will change.

Understandable systems today
A couple of recent posts point out the burden of large code bases. I agree, as suggested by the title of this blog, but it’s easy to confuse a symptom with the problem itself. So I phrase it differently: Unmanageable complexity is the enemy. Code size is often what the enemy smells like.

Developers can better manage complexity even without guarantees similar to what a database offers. Code should have a clear, easily understood state space, preferably applying related changes atomically. Such systems are easier to reason about and change because developers need not concern themselves with side effects of unrelated code; they can focus on the problem at hand. For those who haven’t explored this, I’m indirectly describing the functional style of programming. This is the great hope for pure functional programming: it may spread predictability and a simple model to all development.