Too Much Code

Not enough cohesion

Micro Benchmarks versus Macro Optimizations in Clojure

Here’s a simple question Clojure users hear often:

What is the overhead of Clojure’s persistent data structures?

I ran into this question headlong when profiling and tuning Clara. Clara aims to draw ideas from expert systems into the Clojure ecosystem, making them available as first-class Clojure idioms. Therefore Clara’s working memory works like other Clojure data structures: it is immutable and “updates” create a new, immutable working memory.

Anyone skeptical of the benefits of immutability should go watch Rich Hickey’s talks like Simple Made Easy. Yet these advantages are irrelevant if they don’t perform well enough. So we have a challenge: we know that persistent Clojure structures will lose a micro benchmark comparison to mutable counterparts, but can we balance that with macro optimizations made possible with immutability? The answer is yes, with the techniques below working for Clara and probably for many other projects.

Optimizing Clojure Code

Optimizations should have an objective. My objective with Clara was to make performance at least competitive with latest version of Drools, which may be used to solve similar problems. Clara’s basis in Clojure offers a number of advantages, but we need to make sure performance isn’t a barrier. So I created the clara-benchmark project, using Criterium to benchmark a number of flows in both Clara and Drools. Some findings:

It’s All About the Algorithms

The first run of profiling didn’t look good. Clara was almost ten times slower than Drools for some cases. But it turns out the bulk of this cost had nothing to do with Clojure — my variation of the Rete algorithm was inefficient, indexing facts for join operations that could never occur due to the nature of the rules. In short, my algorithm sucked.

The good news this was easily exposed with a profiling tool and fixed with little effort. I find algorithms in a language like Clojure to be easier to understand and tune because they are expressed as simple transformations of data. We know the structures we receive and return, and simply need to identify the most efficient way to express the transformation. This is a major contrast to systems that force us keep track of a bunch of additional state when working through the system.

A better algorithm was the biggest single improvement, bringing Clara within twice Drools performance or better for the use cases tested. But we’re not done yet.

Strongly Isolated Mutability

A better algorithm got us close, but further profiling revealed a bottleneck for some use cases. Rete engines often perform joins over common facts, like this simple example:

1
2
3
4
5
(defquery orders-by-customer-id
  "Returns the orders for the given customer."
  [:?id]
  [Order (= ?id customerId) (= ?total total)]
  [Customer (= ?id id)])

This queries the working memory to simply find customer and orders with the same id. (See the Clara documentation for details on use.)

Clara makes extensive use of Clojure’s group-by function to group collections of facts by matching keys. After tuning my algorithm, I discovered that some benchmarks were spending the bulk of their time in group-by. The group-by implementation can be found in the Clojure source, but here it is for convenience:

1
2
3
4
5
6
7
8
9
10
11
12
13
(defn group-by
  "Returns a map of the elements of coll keyed by the result of
  f on each element. The value at each key will be a vector of the
  corresponding elements, in the order they appeared in coll."
  {:added "1.2"
   :static true}
  [f coll]
  (persistent!
   (reduce
    (fn [ret x]
      (let [k (f x)]
        (assoc! ret k (conj (get ret k []) x))))
    (transient {}) coll)))

Notice the use of Clojure transients, which are mutable structures designed to be used locally for efficiency. Clojure takes a pragmatic step here. The goal is to keep our systems easy to reason about, and we can achieve that if no external observer can detect mutability. group-by is a pure function working with immutable structures for all observers, but gains performance by using transients internally.

The trouble I ran into with my benchmarks is that I had many items mapping to the same key. Notice that Clojure’s group-by uses a transient map, but that map contains non-transient vectors. So the performance bottleneck arose because this group-by function wasn’t “transient enough” for my particular data.

I worked around this by writing an alternate group-by that better fit my needs. Its internals are hideous but are the result of profiling a couple implementations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(defn tuned-group-by
  "Equivalent of the built-in group-by, but tuned for when 
   there are many values per key."
  [f coll]
  (->> coll
       ;; Create a mutable map of transient vectors.
       (reduce (fn [map value]
                 (let [k (f value)
                       items (or (.get ^java.util.HashMap map k)
                                 (transient []))]
                   (.put ^java.util.HashMap map k (conj! items value)))
                 map)
               (java.util.HashMap.))
      ;; Make the vectors immutable into a transient map.
      (reduce (fn [map [key value]]
                  (assoc! map key (persistent! value)))
                (transient {}))
      ;; Make the map itself immutable.
      (persistent!)))

This is more efficient when there are many items that map to the same key in the returned map, since it uses transient values. (The Java HashMap turned out to be the fastest option to build the result here, but it never escapes this function.) This optimization cut some benchmark times in half. Combined with a number of smaller tweaks, this brought most of Clara’s use cases inline with Drools performance. For other use cases Clara significantly outperforms Drools, but we’ll get to those later.

My tuned-group-by function is faster than Clojure’s group-by for some inputs and slower for others. But this misses a bigger advantage: Clojure’s philosophy of separating functions and data made swapping implementations trivial, allowing users to pick the right ones for their specific needs. This isn’t so easily done if functions are strongly tied to the data they work with, which is an easy pitfall of object-oriented programming.

Referential Transparency Breeds Serendipity

Writing functional code tends to create pleasant surprises. We come across significant advantages that wouldn’t be possible with a different approach. Considering the following Clara example and it’s Drools equivalent:

1
2
3
4
5
6
7
8
9
10
(ns clara.benchmark.visit-order-same-day
  (:require [clara.rules :refer :all]
            [clj-time.coerce :as coerce])
  (:import [clara.benchmark.beans Order Visit]))

(defquery same-day-visit
   "Queries orders that occurred the same day as a visit."
   []
   [Order (= ?id customerId) (= ?day (coerce/to-local-date time))]
   [Visit (= ?id customerId) (= ?day (coerce/to-local-date time))])

Drools equivalent:

1
2
3
4
5
6
7
8
9
10
11
12
13
package clara.benchmark.drools.visit_order_same_day;

import clara.benchmark.beans.Order;
import clara.benchmark.beans.Visit;

import org.joda.time.DateTime;

query "same_day_visit"

   Order($id : customerId, $day : time.toLocalDate())
   Visit($id == customerId, $day == time.toLocalDate())

end

These rules simply identify things happening on the same day. Yet there is a big difference: Drools does an O(n2) Cartesian join where Clara does an O(n*log32n) indexed join of each item. Therefore Clara becomes dramatically faster than Drools for large inputs in this case. Also notice how Clara cleanly integrates with Clojure’s syntax, as opposed to embedding multiple syntaxes into a file, since it treats rules as a control structure.

This is possible because of Clojure’s emphasis on pure, referentially transparent functions. Since we can replace the function call with its result, we can build an index of that result. The outcome is a significantly more efficient system for this class of problem.

Along the same lines, rule engines facilities to reason over sets of facts can be implemented more efficiently under these constraints. Clara’s equivalent of Jess and Drools accumulators simply compile into Clojure reducers, making them more efficient than the alternatives by simply tapping into that feature.

These advantages arise often: we can defer computation to efficient batch operations. We can transparently spread work across threads without dealing with lock-based concurrency. We can memoize functions or build efficient caches based on fast reference comparison. Importantly, when starting a problem it’s not always obvious how these advantages will arise, but these techniques provide an opportunity for great optimizations at the macro level. David Nolen’s work on Om is a dramatic example of this in action.

The Trouble with Benchmarks

X is faster than Y makes for a great incendiary headline on Hacker News, but it doesn’t really make sense. X may be faster than Y for workload Z with trade offs A, B, and C…but for some reason those headlines don’t get as many upvotes.

Benchmarks are informative and an important tool to improve our system. But they aren’t a real measure of a system’s quality or potential. A better measure is how easily a system can be understood, adapted, and expanded. If we can understand the nature of a problem, performance or otherwise, we can usually fix it. Clojure simply provides better mechanisms to understand and improve our systems than other languages I’ve used.

In short, the trouble with benchmarks is they encourage treating symptoms rather the than the explosion of complexity that limits what we can build.

Clara’s Future

All optimizations discussed here are in master and will be released in Clara 0.6.0 this summer. You can see some current comparisons with Drools in the clara-benchmark project. There are still opportunities for improvement in Clara, being a relatively new system. Probably the next significant optimization is greater laziness, which we’re tracking here.

Updates will be posted here and on my twitter feed. I’ll also be discussing modern approaches to expert systems, including Clara, at two conferences over the next few months: Midwest.io in July and strangeloop in September.

Clara 0.5 Released

Clara 0.5 has been released! This is mostly a big fix release with some usability improvements. See the changelog and the github page for usage information.

Clara 0.4 Released

Clara 0.4 has been released, and is available on Clojars! See the github page for usage information. I wrote about the significant features in the Rules as Data post.

The release announcement is in the Clojure Google Group.

This release puts Clara on a strong footing, and I’m looking forward to playing with the new rules-as-data features.

Rules as Data

This started as a topic for a Lambda Lounge meetup but seems worth sharing broadly.
I’ve posted previously about treating rules as a control structure, but for Clara 0.4 rules are now first-class data structures with well-defined schema.  The simple defrule syntax is preserved but the macro now produces a data structure that is used everywhere else. For example, the following code:
1
2
3
4
5
(defrule date-too-early
  "We can't schedule something too early."
  [WorkOrder (< (weeks-until date) 2)]
  =>
  (insert! (->ApprovalRequired :timeline "Date is too early")))
Defines a var containing this structure:
1
2
3
4
5
6
7
8
{:doc "We can't schedule something too early.",
 :name "date-too-early",
 :lhs
 [{:constraints [(< (llkc.example/weeks-until date) 2)],
   :type llkc.example.WorkOrder}],
 :rhs
 (clara.rules/insert!
  (llkc.example/->ApprovalRequired :timeline "Date is too early"))}
The rule structure itself is defined in Clara’s schema, and simply stored in the var with the rule’s name.

So, now that rules are data, we open the door to tooling to explore and manipulate them. For instance, we can visualize the relationship between rules. Here is a visualization of the rules used for the meetup. I arbitrarily chose shapes to distinguish the different types of data:



This image is generated by running the following function from clara.tools.viz against the example used at the meetup:
1
(viz/show-logic! 'llkc.example)
That function simply scans each rule in the given namespace, reads individual conditions from the rule data structure, and wires them up. The graph itself is rendered with GraphViz.

Since all rules of data, they can be filtered or manipulated like any Clojure data structure. So here we take our set of rules and only display those that handle logic for validation errors:
1
2
3
(viz/show-logic!
  (filter #(viz/inserts? % ValidationError)
    (viz/get-productions ['llkc.example])))
In this example there is only one rule that does validation, so the resulting image looks like this:

Rule engine as an API

The rules-as-data model creates another significant advantage: we have decoupled the DSL-style syntax from the rule engine itself. Clara users can now create rules from arbitrary sources, such as a specialized syntax, an external database, or domain-specific file format. (Think instaparse tied to rule generation.) Clara is evolving into a general Rete engine, with its “defrule” syntax being just one way to use it.

So far I’ve written only simple, GraphViz-based visualizations but we can expose these with more sophisticated UIs. Projecting such visualizations onto, say, a D3-based graph could provide a rich, interactive way of exploring logic.

At this point, Clara 0.4 is available as snapshot builds. I expect to do a release in February, pending some cleanup and enhancements to ClojureScript support. I’ll post updates on my twitter feed, @ryanbrush.

Crossing the (data) streams: scalable realtime processing with rules

Pipe-and-filter architectures are among the most successful design patterns ever. They dominate data ingestion and processing today, and give 1970’s hackers yet another chance to remind us how they thought of everything years ago.

Unfortunately modern variations of this can run into an ugly problem: what are the semantics of a “join” operation between multiple infinite streams of data? Popular systems like Storm point out this ambiguity, as does Samza. Both provide primitives to support correlation of events between streams, but the higher-level semantics of a join are punted to the user.

This is fine for such systems providing infrastructure, but is a troublesome model of our applications: if we can’t define clear and compelling semantics for a key part of our processing model, we might be using the wrong model. Projects like Storm offer an excellent infrastructure, but this ambiguity implies that many problems could be solved with a higher-level abstraction.

The challenge with user-defined join semantics is it comes with baggage: maintaining state, structuring it, and recovering state after failure scenarios are challenging problems. It also makes the behavior of the system harder to understand. Since each join can have slightly different behavior, we need to look closely to see what’s going on. A better approach is needed. A set of simple yet flexible join operators would be ideal – so how do we get there?

If we can’t define clear and compelling semantics for a key part of our processing model, we might be using the wrong model.
We might consider CEP-style systems such as Esper and Drools Fusion, which have been tackling declarative-style joins for years. But such systems don’t offer the scalability or processing guarantees of Storm, and they use limited languages that aren’t always expressive enough for sophisticated logic.

We need a data processing model with well-defined, declarative joins while supporting rich application logic. There are lots of options here, but here I’ll focus on one: suppose we could make rules as a control structure scale linearly across a cluster, letting the rules engine distribute join operations. Let’s look at an experiment of making Clara, a Clojure-based rules engine, distribute its working memory and processing across a Storm cluster, with all of the scalability and processing guarantees of the underlying system.

Forward-chaining rules on Storm

Imagine a variant of the Rete algorithm implemented with some simple constraints:

First, each condition in a rule can be evaluated independently, so incoming data can be spread across an arbitrary number of processes and match rule conditions appropriately.

Second, aggregations over matched facts follow a map-and-reduce style pattern – where the map and partial reductions of aggregated facts can be done in parallel across machines.

Finally, “joins” of aggregations or individual facts are always hash-based. So joins can be efficiently achieved by sending matched facts to the same node via their hash values.

The result is our Storm topology looks something like this:


Let’s consider a simple example. Suppose we have feeds of temperature readings from multiple locations in some facility, and we want to take action in those locations should our readings exceed a threshold.

Each temperature reading has a logical timestamp, so since our logic is interested in the “newest” reading, we use a Clara accumulator that selects the item with the newest timestamp:
1
(def newest-temp (acc/max :timestamp :returns-fact true))
We then use it in a rule that processes all our readings for a location and preserves the newest:
1
2
3
4
5
6
(defrule get-current-temperature
  "Get the current temperature at a location by simply 
   looking at the newest reading."
  [?current-temp <- newest-temp :from [TemperatureReading (== ?location location)]]
  =>
  (insert! (->CurrentTemperature (:value ?current-temp) ?location)))
Note that accumulators preserve minimal state and apply changes incrementally. In this case we keep only the current temperature based on timestamp; lower values are simply discarded, so we can deal with an infinite stream.  Also, this example keeps the maximum, but we could easily accumulate some other value, such as a time-weighted histogram of temperatures to we’re robust to outliers. Any fact that doesn’t match a rule is simply discarded, incurring no further cost.

Now that we have the current temperature for each location, we want to back off our devices in those locations if a threshold is exceeded. We can write this as a simple rule as well:
1
2
3
4
5
6
7
8
(defrule reduce-device-speed
  "Reduce the speed of all devices in a location that has a high temperature."
  [CurrentTemperature (> value high-threshold)
                      (= ?location-id location)]
  ;; Find all Device records in the location, and bind them to the ?device variable.
  [?device <- Device (= ?location-id location)]
  =>
  (reduce-speed! ?device))
This first condition matches current temperatures that exceed the threshold, and binds it the ?location-id variable.  The second condition finds all devices with a matching location, and binds them to the ?device variable.  This is then visible on the right-hand side of the rule, where we can take action.

This is effectively performing a join between temperatures that exceeded a threshold at a given location and devices in that same location. When running over Storm, this rule wish hash Device and CurrentTemperature facts and send them to the same processing using a hash value. This is done using Storm’s group-by field functionality over a data stream that connects each bolt instance together.

All state for the join operations are managed internally by Clara’s engine. Accumulators like the example here compute in a rolling fashion, merging new data together, retracting previously accumulated values, and inserting new ones. Combined with rule engine-style truth maintenance, developers can simply declare their logic and let the engine maintain state and consistency.

Integration with Processing Topologies

The rules used in this example are here, and are run with the Storm launching code here. There is also a draft Java API to attach rules to a topology. Note that our approach is to simply attach to a Storm topology defined via a provided TopologyBuilders, so users can pre-process or perform other logic in their topology, and route data as appropriate into the distributed rule engine. Also, these examples use Clojure records, but they can work equally well with Java objects, including ones generated by Thrift, Avro, or Protocol Buffers.

Current State

A prototype of rules over Storm is in the clara-storm project. It also includes the abilities to run queries across the rule engine’s working memory, using Storm’s Distributed RPC mechanism. A handful of things need to come together to make this production ready, inluding:

  • I’d like input and suggestions from members of the Storm community. This topology layout isn’t an idiomatic use of Storm, so we need to ensure this won’t run into problems as we scale.  (This is one of the reasons I’m posting this now.)
  • The ability to persist Clara’s working memory to recover from machine failures. This will probably take the form of writing state changes for each rule node to reliable write-ahead log, with Kafka being a good storage mechanism.
  • Optimizations ranging from efficient serialization to doing partial aggregations prior to sharing state between bolts are needed.
  • Consider temporal operators in Clara. Accumulators have worked well to this point but may run into limits.
  • Testing at scale!
The biggest takeaway is how technologies like Storm and Clojure provide an opportunity to express computation with higher-level abstractions. Things like SummingBird (and Cascalog 2.0?) offer ways to query data streams. These could be complemented by support for forward-chaining rules for problems easily expressed that way.

Rules as a Control Structure

Rule engines seem to draw love-or-hate reactions. On one hand they offer a simple way to manage lots of arbitrary, complex, frequently-changing business logic. On the other, their simplicity often comes with limitations, and edge cases pop up that can’t be elegantly solved in the confines of rules. There are few things more frustrating than a tool meant to help you solve problems actually creates them.

The tragedy is that excellent ideas for modeling logic with rules have been hijacked by a myth: that it’s  possible to write code – to unambiguously define logic in a textual form – without actually writing code. We see authoring tools generating rules in limited languages (or XML!), making the case that domain experts can author logic without development expertise. The shortage of good developers makes this tremendously appealing, and this demand has drawn supply.

If you have a limited problem space satisfied by such tools, then great. But problems remain:

  • Limited problem spaces often don’t stay limited.
  • Many problems involving arbitrary domain knowledge are best solved with rules when we can, but require the ability to integrate with a richer programming environment when we must.
So how do we approach this? We need to stop thinking of rule engines as external systems that create artificial barriers between our logic, but as first-class constructs seamlessly integrated in the host language.  In other words, rules engines are best viewed as an alternate control structure, suited to the business problem at hand.

Clojure is uniquely positioned to tackle this problem. Macros make sophisticated alternate control structures possible, Clojure’s rich data structures make it suitable for solving many classes of problems, and its JVM integration makes it easy to plug into many systems. This is the idea behind Clara, a forward-chaining rules implementation in pure Clojure.

Here’s an example from the Clara documentation.  In a retail setting with many arbitrary frequently promotions, we might author them like this:
1
2
3
4
5
(defrule free-lunch-with-gizmo
  "Anyone who purchases a gizmo gets a free lunch."
  [Purchase (= item :gizmo)]
  =>
  (insert! (->Promotion :free-lunch-with-gizmo :lunch)))
And create a query to retrieve promotions:
1
2
3
4
(defquery get-promotions
  "Query to find promotions for the purchase."
  []
  [?promotion <- Promotion])
All of this is usable with idiomatic Clojure code:
1
2
3
4
5
6
7
(-> (mk-session 'clara.examples.shopping) ; Load the rules.
    (insert (->Customer :vip)
            (->Order 2013 :march 20)
            (->Purchase 20 :gizmo)
            (->Purchase 120 :widget)) ; Insert some facts.
    (fire-rules)
    (query get-promotions))
The resulting query returns the matching promotions. More sophisticated examples may join multiple facts and query by parameters; see the developer guide or the  clara-examples project for more.

Each rule constraint and action – the left-hand and right-hand sides – are simply Clojure expressions that can contain arbitrary logic. We also benefit from other advantages of Clojure. For instance, Clara’s working memory is an immutable, persistent data structure. Some of the advantages of that may come in a later post.

Rules by domain experts

So we’ve broken down some traditional barriers in rule engines, but it seems like this approach comes with a downside: by making rules a control structure in high-level languages, are we excluding non-programmer domain experts from authoring them?

We can expand our rule authoring audience in a couple ways:

  1. Encapsulate rules into their own files editable by domain experts, yet compiled into the rest of the system. An audience savvy enough to work with, say, Drools can understand the above examples and many others.
  2. Generate rules from higher-level, domain-specific macros. Business logic could be modeled in a higher-level declarative structure that creates the rules at compile time. Generating rules is actually simpler than most logic generation, since rule ordering and truth maintenance are handled by the engine itself.
  3. Tooling to generate rules directly or indirectly. Like all Lisp code, these rules are also data structures. In fact, they are simpler to work with than an arbitrary s-expression because they offer more structure: a set of facts used by simple constraints resulting in an action, which could also contribute more knowledge to the session’s working memory. 
Ultimately, all of this results in Clojure code that plugs directly into a rich ecosystem. 

These options will be fun to explore, but this isn’t my initial target. Let’s first create a useful tool for expressing complex logic in Clojure. Hopefully this will become a basis for exploration, borrowing good ideas for expressing business rules and making them available in many environments via the best Lisp available.

If this seems promising, check out the Clara project on github for more. I’ll also post updates on twitter at @ryanbrush.

Update: see the Clojure Google Group for some discussion on this topic.

A long time ago, we used to be friends

No one ever intends to let their blogs go dark, but here we are.  I’m planning on dusting off this blog to write about some new personal projects, but for now I thought I’d link to writing I’ve done in other venues:

I’ve written a few posts for Cerner’s Engineering blog as part of my work at Cerner. These include:
Some of this content is from talks that I’ve given at Hadoop World, ApacheCon, and StampedeCon.

Some ideas originally posted to this blog have been published in the book 97 Things Every Programmer Should Know.

My latest personal project has been the construction of a forward-chaining rules engine in Clojure. I expect this and related problems to be the emphasis on this blog moving forward.

Start by embracing your limits

Nothing in human history has offered more promise but has seen as many failures as software. We’ve all seen moments of greatness, where a program seems like magic – but such gems are surrounded by minefields of bugs and indecipherable interfaces.

The result of all this is we programmers are often a frustrated bunch. But should we be? After all, what makes us think that as a species we should have the aptitude to create great software? Our skill sets evolved in an environment that favored those who could hunt boar and find berries – any capacity to succeed in the abstract world of software is pure, accidental side effect. Perhaps we should be awed by software’s successes rather than frustrated by its failures.

The good news is we can improve despite our limitations, and it starts with this: accept that we generally have no innate ability to create great systems, and design our practices around that. It seems like every major step forward in software has followed this pattern of embracing our limitations. For instance, we move to iterative development since we can’t anticipate all variables of a project. We aggressively unit test because we realize we’re prone to error. Libraries derived from practical experience frequently replace those built by expert groups. The list goes on.

This type of admission is humbling, but it can also be liberating. Here’s an example: In years past I would spend hours agonizing over an internal design decision for a system I was building. I figured if I got it right we could easily bolt on some new feature. Sometimes I was right, but often times I was not. My code often was littered with unnecessary structure that only made things more complicated.

Contrast that to today: I know I can’t anticipate future needs in most cases, so I just apply this simple heuristic:
  1. When in doubt, do the simplest thing possible to solve the problem at hand
  2. Plan on refactoring later.
The first step frees us from trying to anticipate all future needs – but this is not quick and dirty cowboy coding. An essential element of code is to create an understandable and maintainable system. Don’t try to code for future needs. Instead, structure code for present needs so it can be leveraged in the future.

So how do we do this? A couple things to keep in mind:
  • When in doubt, leave it out. (also known as “You Ain’t Gonna Need It”)
  • Unit-Testable designs tend to be reusable designs. Unit tests not only catch bugs that can result from refactoring, but they encourage modularity to enable that refactoring.
  • Don’t try to design an API independently of an application. You won’t understand your users’ needs well enough to create a good experience. Build the API as part of the application to make sure its needs are met, then factor out and generalize.
  • Group code together that tends to change for similar reasons. If your Widget class is responsible for rendering its UI and writing to the database and business logic, you can’t use it anywhere else without significant changes. High cohesion and loose coupling.
There are no hard-and-fast rules to building software, but we all need a compass to help guide us through the thousands of micro-decisions we make every time we write code. Hopefully this post can help build that compass.

Can “Agile in the Large” Succeed?

Agile Development isn’t perfect, but it got something right. Right enough, at least, to gain enough momentum to become a buzzword – and for consultants to latch on by selling “Agile” to big enterprises. The result is “Agile in the Large”.

Agile in the Large ranges from the Half-Arsed Agile Manifesto to Cargo Cult practices. Too often we simply bolt on a few agile practices without understanding how they provide value. Adding scrums and iteration reviews to your current process and expecting improvement is like building a runway in your back yard and expecting planes to land.

Whether it’s labelled agile or not, a cohesive development team works for a couple reasons. It creates a focused team fully aligned to a common goal. It avoids “Us and Them” mentalities, enabling everyone to adapt to meet the goal. It offers a self-correcting strategy for the thousands of micro-decisions when building software: It must be testable, “You Ain’t Gonna Need It”, and it creates quick feedback to make sure you’re solving the actual problem.

Agile in the Large is flawed because it often fails to achieve these basic ingredients. Take Scrum of Scrums, for example. It tries to coordinate and resolve technical dependencies between teams by having representatives from each team get together and talk through them. It’s better than nothing, but an emerging system has too much ambiguity: the interaction between components is still being defined, bugs can lead to finger pointing, and the important sense of ownership is lost. Everyone feels like cogs in a machine rather than someone solving an important problem for a real user. Our ability to adapt the microdecisions that build great software is lost.

In fact, Agile in the Large seems doomed from the beginning. Once you’re beyond some number of people in a single project, it’s impossible to create that sense of shared ownership and adaptability. Great software is created by small, dedicated teams.

So, what then? It seems the only way out is to make Agile in the Large more like Agile in the Small. A team should be working on one and only one project, and include everything necessary for that project to be successful. How we do that depends on where we are in a project’s maturity curve.

New vs. mature systems
New development is the most easily handled. Look at the problems at hand, and make sure the team is equipped to handle them end-to-end. This new project may have several components, but now is not the time to split those components into their own teams; keep a single team aligned to the user’s goals. After all, if a component isn’t aligned to some user’s goal, what is its value? Organize around components too early, and team cohesion is lost.

So we are off to a good start, but we need to adjust our strategy as a successful project matures. You may find parts of your system has value for other uses. This seems timeless: C first was a successful tool for building Unix, Ruby on Rails was an application before it was a framework, Amazon was an online book store before it was a cloud platform, and so on. In all of these cases a successful system was built, and then groups arise around the reusable pieces. Reusable technology will naturally arise from a successful project. Embrace that, but don’t force it.

Open source as a model for reuse
Later in a system’s life we find ourselves consuming many assets from a variety of teams. Now it’s easy to let coordination and communication friction kill our focused project. Fortunately, the open source world gives us the answer.

Quite simply, needed enhancements to common assets should be written and contributed by consuming teams. This offers several advantages over logging an enhancement request. For instance, we reduce deadline haggling and misaligned schedules. We also reduce the need for frequent status updates and opportunities for miscommunication between teams.

Of course, not all changes to open source projects come in the form of patches. There should still be a team around the asset in question, responsible for its architecture and fundamental direction. That team also operates as a gatekeeper, ensuring all patches are properly unit tested, documented, and don’t threaten the integrity of the system.

Changes must be handled on a case-by-case basis, but the primary mode of operation should follow the agile ideal: a single team with responsibility for a project end-to-end. This includes contributing to assets it consumes.

What if even the initial project is too big?
I’ll touch on one final question: what if even the initial scope of a project is beyond what a single team can accomplish? Find a smaller initial scope, get it working, and grow from there. Do this even if the initial scope means building placeholders that are discarded in the final result. The time saved and friction eliminated by creating a single, focused team will outweigh this. Think of it as scaffolding that eventually gets removed.

In the end, Agile in the Large only works if we make it more like Agile in the Small. Hopefully this article is a step in that direction.

Beware of the Flying Car

Every so often we developers are asked to build a flying car. Our users obviously need one to avoid traffic and get to meetings on time. So we set down the path to meet those needs.

The trouble is most of us have no idea how to build a flying car. Even so, this is the project so we better get started! Our first release of the flying car will be small, and carried around on a string. We show good progress to our user’s needs and gain approval from our superiors. We will simply remove the string in a later release.

Hopefully most of us will recognize a Flying Car Project and take time to understand and address the users’ goals rather than sprinting toward a brick wall. The Flying Car is a means, not an end in itself – and there are other, executable means to the desired end.

Of course, there will always be someone eager to go build that flying car. If that happens, try to steer them in the right direction. If all else fails, just don’t be standing underneath it when they cut the string.