Too Much Code

Not enough cohesion

Clara 0.4 Released

Clara 0.4 has been released, and is available on Clojars! See the github page for usage information. I wrote about the significant features in the Rules as Data post.

The release announcement is in the Clojure Google Group.

This release puts Clara on a strong footing, and I’m looking forward to playing with the new rules-as-data features.

Rules as Data

This started as a topic for a Lambda Lounge meetup but seems worth sharing broadly.
I’ve posted previously about treating rules as a control structure, but for Clara 0.4 rules are now first-class data structures with well-defined schema.  The simple defrule syntax is preserved but the macro now produces a data structure that is used everywhere else. For example, the following code:
1
2
3
4
5
(defrule date-too-early
  "We can't schedule something too early."
  [WorkOrder (< (weeks-until date) 2)]
  =>
  (insert! (->ApprovalRequired :timeline "Date is too early")))
Defines a var containing this structure:
1
2
3
4
5
6
7
8
{:doc "We can't schedule something too early.",
 :name "date-too-early",
 :lhs
 [{:constraints [(< (llkc.example/weeks-until date) 2)],
   :type llkc.example.WorkOrder}],
 :rhs
 (clara.rules/insert!
  (llkc.example/->ApprovalRequired :timeline "Date is too early"))}
The rule structure itself is defined in Clara’s schema, and simply stored in the var with the rule’s name.

So, now that rules are data, we open the door to tooling to explore and manipulate them. For instance, we can visualize the relationship between rules. Here is a visualization of the rules used for the meetup. I arbitrarily chose shapes to distinguish the different types of data:



This image is generated by running the following function from clara.tools.viz against the example used at the meetup:
1
(viz/show-logic! 'llkc.example)
That function simply scans each rule in the given namespace, reads individual conditions from the rule data structure, and wires them up. The graph itself is rendered with GraphViz.

Since all rules of data, they can be filtered or manipulated like any Clojure data structure. So here we take our set of rules and only display those that handle logic for validation errors:
1
2
3
(viz/show-logic!
  (filter #(viz/inserts? % ValidationError)
    (viz/get-productions ['llkc.example])))
In this example there is only one rule that does validation, so the resulting image looks like this:

Rule engine as an API

The rules-as-data model creates another significant advantage: we have decoupled the DSL-style syntax from the rule engine itself. Clara users can now create rules from arbitrary sources, such as a specialized syntax, an external database, or domain-specific file format. (Think instaparse tied to rule generation.) Clara is evolving into a general Rete engine, with its “defrule” syntax being just one way to use it.

So far I’ve written only simple, GraphViz-based visualizations but we can expose these with more sophisticated UIs. Projecting such visualizations onto, say, a D3-based graph could provide a rich, interactive way of exploring logic.

At this point, Clara 0.4 is available as snapshot builds. I expect to do a release in February, pending some cleanup and enhancements to ClojureScript support. I’ll post updates on my twitter feed, @ryanbrush.

Crossing the (Data) Streams: Scalable Realtime Processing With Rules

Pipe-and-filter architectures are among the most successful design patterns ever. They dominate data ingestion and processing today, and give 1970’s hackers yet another chance to remind us how they thought of everything years ago.

Unfortunately modern variations of this can run into an ugly problem: what are the semantics of a “join” operation between multiple infinite streams of data? Popular systems like Storm point out this ambiguity, as does Samza. Both provide primitives to support correlation of events between streams, but the higher-level semantics of a join are punted to the user.

This is fine for such systems providing infrastructure, but is a troublesome model of our applications: if we can’t define clear and compelling semantics for a key part of our processing model, we might be using the wrong model. Projects like Storm offer an excellent infrastructure, but this ambiguity implies that many problems could be solved with a higher-level abstraction.

The challenge with user-defined join semantics is it comes with baggage: maintaining state, structuring it, and recovering state after failure scenarios are challenging problems. It also makes the behavior of the system harder to understand. Since each join can have slightly different behavior, we need to look closely to see what’s going on. A better approach is needed. A set of simple yet flexible join operators would be ideal – so how do we get there?

If we can’t define clear and compelling semantics for a key part of our processing model, we might be using the wrong model.
We might consider CEP-style systems such as Esper and Drools Fusion, which have been tackling declarative-style joins for years. But such systems don’t offer the scalability or processing guarantees of Storm, and they use limited languages that aren’t always expressive enough for sophisticated logic.

We need a data processing model with well-defined, declarative joins while supporting rich application logic. There are lots of options here, but here I’ll focus on one: suppose we could make rules as a control structure scale linearly across a cluster, letting the rules engine distribute join operations. Let’s look at an experiment of making Clara, a Clojure-based rules engine, distribute its working memory and processing across a Storm cluster, with all of the scalability and processing guarantees of the underlying system.

Forward-chaining rules on Storm

Imagine a variant of the Rete algorithm implemented with some simple constraints:

First, each condition in a rule can be evaluated independently, so incoming data can be spread across an arbitrary number of processes and match rule conditions appropriately.

Second, aggregations over matched facts follow a map-and-reduce style pattern – where the map and partial reductions of aggregated facts can be done in parallel across machines.

Finally, “joins” of aggregations or individual facts are always hash-based. So joins can be efficiently achieved by sending matched facts to the same node via their hash values.

The result is our Storm topology looks something like this:


Let’s consider a simple example. Suppose we have feeds of temperature readings from multiple locations in some facility, and we want to take action in those locations should our readings exceed a threshold.

Each temperature reading has a logical timestamp, so since our logic is interested in the “newest” reading, we use a Clara accumulator that selects the item with the newest timestamp:
1
(def newest-temp (acc/max :timestamp :returns-fact true))
We then use it in a rule that processes all our readings for a location and preserves the newest:
1
2
3
4
5
6
(defrule get-current-temperature
  "Get the current temperature at a location by simply 
   looking at the newest reading."
  [?current-temp <- newest-temp :from [TemperatureReading (== ?location location)]]
  =>
  (insert! (->CurrentTemperature (:value ?current-temp) ?location)))
Note that accumulators preserve minimal state and apply changes incrementally. In this case we keep only the current temperature based on timestamp; lower values are simply discarded, so we can deal with an infinite stream.  Also, this example keeps the maximum, but we could easily accumulate some other value, such as a time-weighted histogram of temperatures to we’re robust to outliers. Any fact that doesn’t match a rule is simply discarded, incurring no further cost.

Now that we have the current temperature for each location, we want to back off our devices in those locations if a threshold is exceeded. We can write this as a simple rule as well:
1
2
3
4
5
6
7
8
(defrule reduce-device-speed
  "Reduce the speed of all devices in a location that has a high temperature."
  [CurrentTemperature (> value high-threshold)
                      (= ?location-id location)]
  ;; Find all Device records in the location, and bind them to the ?device variable.
  [?device <- Device (= ?location-id location)]
  =>
  (reduce-speed! ?device))
This first condition matches current temperatures that exceed the threshold, and binds it the ?location-id variable.  The second condition finds all devices with a matching location, and binds them to the ?device variable.  This is then visible on the right-hand side of the rule, where we can take action.

This is effectively performing a join between temperatures that exceeded a threshold at a given location and devices in that same location. When running over Storm, this rule wish hash Device and CurrentTemperature facts and send them to the same processing using a hash value. This is done using Storm’s group-by field functionality over a data stream that connects each bolt instance together.

All state for the join operations are managed internally by Clara’s engine. Accumulators like the example here compute in a rolling fashion, merging new data together, retracting previously accumulated values, and inserting new ones. Combined with rule engine-style truth maintenance, developers can simply declare their logic and let the engine maintain state and consistency.

Integration with Processing Topologies

The rules used in this example are here, and are run with the Storm launching code here. There is also a draft Java API to attach rules to a topology. Note that our approach is to simply attach to a Storm topology defined via a provided TopologyBuilders, so users can pre-process or perform other logic in their topology, and route data as appropriate into the distributed rule engine. Also, these examples use Clojure records, but they can work equally well with Java objects, including ones generated by Thrift, Avro, or Protocol Buffers.

Current State

A prototype of rules over Storm is in the clara-storm project. It also includes the abilities to run queries across the rule engine’s working memory, using Storm’s Distributed RPC mechanism. A handful of things need to come together to make this production ready, inluding:

  • I’d like input and suggestions from members of the Storm community. This topology layout isn’t an idiomatic use of Storm, so we need to ensure this won’t run into problems as we scale.  (This is one of the reasons I’m posting this now.)
  • The ability to persist Clara’s working memory to recover from machine failures. This will probably take the form of writing state changes for each rule node to reliable write-ahead log, with Kafka being a good storage mechanism.
  • Optimizations ranging from efficient serialization to doing partial aggregations prior to sharing state between bolts are needed.
  • Consider temporal operators in Clara. Accumulators have worked well to this point but may run into limits.
  • Testing at scale!
The biggest takeaway is how technologies like Storm and Clojure provide an opportunity to express computation with higher-level abstractions. Things like SummingBird (and Cascalog 2.0?) offer ways to query data streams. These could be complemented by support for forward-chaining rules for problems easily expressed that way.

Rules as a Control Structure

Rule engines seem to draw love-or-hate reactions. On one hand they offer a simple way to manage lots of arbitrary, complex, frequently-changing business logic. On the other, their simplicity often comes with limitations, and edge cases pop up that can’t be elegantly solved in the confines of rules. There are few things more frustrating than a tool meant to help you solve problems actually creates them.

The tragedy is that excellent ideas for modeling logic with rules have been hijacked by a myth: that it’s  possible to write code – to unambiguously define logic in a textual form – without actually writing code. We see authoring tools generating rules in limited languages (or XML!), making the case that domain experts can author logic without development expertise. The shortage of good developers makes this tremendously appealing, and this demand has drawn supply.

If you have a limited problem space satisfied by such tools, then great. But problems remain:

  • Limited problem spaces often don’t stay limited.
  • Many problems involving arbitrary domain knowledge are best solved with rules when we can, but require the ability to integrate with a richer programming environment when we must.
So how do we approach this? We need to stop thinking of rule engines as external systems that create artificial barriers between our logic, but as first-class constructs seamlessly integrated in the host language.  In other words, rules engines are best viewed as an alternate control structure, suited to the business problem at hand.

Clojure is uniquely positioned to tackle this problem. Macros make sophisticated alternate control structures possible, Clojure’s rich data structures make it suitable for solving many classes of problems, and its JVM integration makes it easy to plug into many systems. This is the idea behind Clara, a forward-chaining rules implementation in pure Clojure.

Here’s an example from the Clara documentation.  In a retail setting with many arbitrary frequently promotions, we might author them like this:
1
2
3
4
5
(defrule free-lunch-with-gizmo
  "Anyone who purchases a gizmo gets a free lunch."
  [Purchase (= item :gizmo)]
  =>
  (insert! (->Promotion :free-lunch-with-gizmo :lunch)))
And create a query to retrieve promotions:
1
2
3
4
(defquery get-promotions
  "Query to find promotions for the purchase."
  []
  [?promotion <- Promotion])
All of this is usable with idiomatic Clojure code:
1
2
3
4
5
6
7
(-> (mk-session 'clara.examples.shopping) ; Load the rules.
    (insert (->Customer :vip)
            (->Order 2013 :march 20)
            (->Purchase 20 :gizmo)
            (->Purchase 120 :widget)) ; Insert some facts.
    (fire-rules)
    (query get-promotions))
The resulting query returns the matching promotions. More sophisticated examples may join multiple facts and query by parameters; see the developer guide or the  clara-examples project for more.

Each rule constraint and action – the left-hand and right-hand sides – are simply Clojure expressions that can contain arbitrary logic. We also benefit from other advantages of Clojure. For instance, Clara’s working memory is an immutable, persistent data structure. Some of the advantages of that may come in a later post.

Rules by domain experts

So we’ve broken down some traditional barriers in rule engines, but it seems like this approach comes with a downside: by making rules a control structure in high-level languages, are we excluding non-programmer domain experts from authoring them?

We can expand our rule authoring audience in a couple ways:

  1. Encapsulate rules into their own files editable by domain experts, yet compiled into the rest of the system. An audience savvy enough to work with, say, Drools can understand the above examples and many others.
  2. Generate rules from higher-level, domain-specific macros. Business logic could be modeled in a higher-level declarative structure that creates the rules at compile time. Generating rules is actually simpler than most logic generation, since rule ordering and truth maintenance are handled by the engine itself.
  3. Tooling to generate rules directly or indirectly. Like all Lisp code, these rules are also data structures. In fact, they are simpler to work with than an arbitrary s-expression because they offer more structure: a set of facts used by simple constraints resulting in an action, which could also contribute more knowledge to the session’s working memory. 
Ultimately, all of this results in Clojure code that plugs directly into a rich ecosystem. 

These options will be fun to explore, but this isn’t my initial target. Let’s first create a useful tool for expressing complex logic in Clojure. Hopefully this will become a basis for exploration, borrowing good ideas for expressing business rules and making them available in many environments via the best Lisp available.

If this seems promising, check out the Clara project on github for more. I’ll also post updates on twitter at @ryanbrush.

Update: see the Clojure Google Group for some discussion on this topic.

A Long Time Ago, We Used to Be Friends

No one ever intends to let their blogs go dark, but here we are.  I’m planning on dusting off this blog to write about some new personal projects, but for now I thought I’d link to writing I’ve done in other venues:

I’ve written a few posts for Cerner’s Engineering blog as part of my work at Cerner. These include:
Some of this content is from talks that I’ve given at Hadoop World, ApacheCon, and StampedeCon.

Some ideas originally posted to this blog have been published in the book 97 Things Every Programmer Should Know.

My latest personal project has been the construction of a forward-chaining rules engine in Clojure. I expect this and related problems to be the emphasis on this blog moving forward.

Start by Embracing Your Limits

Nothing in human history has offered more promise but has seen as many failures as software. We’ve all seen moments of greatness, where a program seems like magic – but such gems are surrounded by minefields of bugs and indecipherable interfaces.

The result of all this is we programmers are often a frustrated bunch. But should we be? After all, what makes us think that as a species we should have the aptitude to create great software? Our skill sets evolved in an environment that favored those who could hunt boar and find berries – any capacity to succeed in the abstract world of software is pure, accidental side effect. Perhaps we should be awed by software’s successes rather than frustrated by its failures.

The good news is we can improve despite our limitations, and it starts with this: accept that we generally have no innate ability to create great systems, and design our practices around that. It seems like every major step forward in software has followed this pattern of embracing our limitations. For instance, we move to iterative development since we can’t anticipate all variables of a project. We aggressively unit test because we realize we’re prone to error. Libraries derived from practical experience frequently replace those built by expert groups. The list goes on.

This type of admission is humbling, but it can also be liberating. Here’s an example: In years past I would spend hours agonizing over an internal design decision for a system I was building. I figured if I got it right we could easily bolt on some new feature. Sometimes I was right, but often times I was not. My code often was littered with unnecessary structure that only made things more complicated.

Contrast that to today: I know I can’t anticipate future needs in most cases, so I just apply this simple heuristic:
  1. When in doubt, do the simplest thing possible to solve the problem at hand
  2. Plan on refactoring later.
The first step frees us from trying to anticipate all future needs – but this is not quick and dirty cowboy coding. An essential element of code is to create an understandable and maintainable system. Don’t try to code for future needs. Instead, structure code for present needs so it can be leveraged in the future.

So how do we do this? A couple things to keep in mind:
  • When in doubt, leave it out. (also known as “You Ain’t Gonna Need It”)
  • Unit-Testable designs tend to be reusable designs. Unit tests not only catch bugs that can result from refactoring, but they encourage modularity to enable that refactoring.
  • Don’t try to design an API independently of an application. You won’t understand your users’ needs well enough to create a good experience. Build the API as part of the application to make sure its needs are met, then factor out and generalize.
  • Group code together that tends to change for similar reasons. If your Widget class is responsible for rendering its UI and writing to the database and business logic, you can’t use it anywhere else without significant changes. High cohesion and loose coupling.
There are no hard-and-fast rules to building software, but we all need a compass to help guide us through the thousands of micro-decisions we make every time we write code. Hopefully this post can help build that compass.

Can “Agile in the Large” Succeed?

Agile Development isn’t perfect, but it got something right. Right enough, at least, to gain enough momentum to become a buzzword – and for consultants to latch on by selling “Agile” to big enterprises. The result is “Agile in the Large”.

Agile in the Large ranges from the Half-Arsed Agile Manifesto to Cargo Cult practices. Too often we simply bolt on a few agile practices without understanding how they provide value. Adding scrums and iteration reviews to your current process and expecting improvement is like building a runway in your back yard and expecting planes to land.

Whether it’s labelled agile or not, a cohesive development team works for a couple reasons. It creates a focused team fully aligned to a common goal. It avoids “Us and Them” mentalities, enabling everyone to adapt to meet the goal. It offers a self-correcting strategy for the thousands of micro-decisions when building software: It must be testable, “You Ain’t Gonna Need It”, and it creates quick feedback to make sure you’re solving the actual problem.

Agile in the Large is flawed because it often fails to achieve these basic ingredients. Take Scrum of Scrums, for example. It tries to coordinate and resolve technical dependencies between teams by having representatives from each team get together and talk through them. It’s better than nothing, but an emerging system has too much ambiguity: the interaction between components is still being defined, bugs can lead to finger pointing, and the important sense of ownership is lost. Everyone feels like cogs in a machine rather than someone solving an important problem for a real user. Our ability to adapt the microdecisions that build great software is lost.

In fact, Agile in the Large seems doomed from the beginning. Once you’re beyond some number of people in a single project, it’s impossible to create that sense of shared ownership and adaptability. Great software is created by small, dedicated teams.

So, what then? It seems the only way out is to make Agile in the Large more like Agile in the Small. A team should be working on one and only one project, and include everything necessary for that project to be successful. How we do that depends on where we are in a project’s maturity curve.

New vs. mature systems
New development is the most easily handled. Look at the problems at hand, and make sure the team is equipped to handle them end-to-end. This new project may have several components, but now is not the time to split those components into their own teams; keep a single team aligned to the user’s goals. After all, if a component isn’t aligned to some user’s goal, what is its value? Organize around components too early, and team cohesion is lost.

So we are off to a good start, but we need to adjust our strategy as a successful project matures. You may find parts of your system has value for other uses. This seems timeless: C first was a successful tool for building Unix, Ruby on Rails was an application before it was a framework, Amazon was an online book store before it was a cloud platform, and so on. In all of these cases a successful system was built, and then groups arise around the reusable pieces. Reusable technology will naturally arise from a successful project. Embrace that, but don’t force it.

Open source as a model for reuse
Later in a system’s life we find ourselves consuming many assets from a variety of teams. Now it’s easy to let coordination and communication friction kill our focused project. Fortunately, the open source world gives us the answer.

Quite simply, needed enhancements to common assets should be written and contributed by consuming teams. This offers several advantages over logging an enhancement request. For instance, we reduce deadline haggling and misaligned schedules. We also reduce the need for frequent status updates and opportunities for miscommunication between teams.

Of course, not all changes to open source projects come in the form of patches. There should still be a team around the asset in question, responsible for its architecture and fundamental direction. That team also operates as a gatekeeper, ensuring all patches are properly unit tested, documented, and don’t threaten the integrity of the system.

Changes must be handled on a case-by-case basis, but the primary mode of operation should follow the agile ideal: a single team with responsibility for a project end-to-end. This includes contributing to assets it consumes.

What if even the initial project is too big?
I’ll touch on one final question: what if even the initial scope of a project is beyond what a single team can accomplish? Find a smaller initial scope, get it working, and grow from there. Do this even if the initial scope means building placeholders that are discarded in the final result. The time saved and friction eliminated by creating a single, focused team will outweigh this. Think of it as scaffolding that eventually gets removed.

In the end, Agile in the Large only works if we make it more like Agile in the Small. Hopefully this article is a step in that direction.

Beware of the Flying Car

Every so often we developers are asked to build a flying car. Our users obviously need one to avoid traffic and get to meetings on time. So we set down the path to meet those needs.

The trouble is most of us have no idea how to build a flying car. Even so, this is the project so we better get started! Our first release of the flying car will be small, and carried around on a string. We show good progress to our user’s needs and gain approval from our superiors. We will simply remove the string in a later release.

Hopefully most of us will recognize a Flying Car Project and take time to understand and address the users’ goals rather than sprinting toward a brick wall. The Flying Car is a means, not an end in itself – and there are other, executable means to the desired end.

Of course, there will always be someone eager to go build that flying car. If that happens, try to steer them in the right direction. If all else fails, just don’t be standing underneath it when they cut the string.

And We’re Back

After an unbelievably long hiatus, I’m going to start blogging again. It’s funny how becoming a parent makes everything else seem to go away for a while.

I don’t expect I will ever post at a regular intervals here. I’ll post when I feel like I can express something that gets closer to some truth about software – at least to me. How often will that happen? Who knows?

The rebirth of this blog will come with a shift in material, at least for the near future. I’ve recently become more interested in the social aspect of building software. How should we organize ourselves to create great software? How should that change over time?

I am and always will be a programmer at heart. My shift in emphasis simply comes from the realization that our biggest challenges aren’t technical. They’re social.

The Guru Myth

Anyone who has worked in software long enough has heard questions like this:

I’m getting exception XYZ. Do you know what the problem is?

The questioner didn’t bother to include a stack trace, an error log, or any context leading to the problem. He or she seems to think you operate on a different plane, that solutions appear to you without analysis based on evidence. This person thinks you are a guru.

We expect such questions from those unfamiliar with software; to them systems can seem almost magical. What worries me is seeing this in the software community. Similar questions arise in program design, such as “I’m building inventory management. Should I use optimistic locking?” Ironically, the person asking the question is often better equipped to answer it than the question’s recipient. The questioner presumably knows the context, knows the requirements, and can read about the advantages and disadvantages of different strategies. Yet this person expects an intelligent answer without supplying context. He or she expects magic.

It’s time for the software industry to dispel this guru myth. “Gurus” are human; they apply logic and systematically analyze problems like the rest of us. Consider the best programmer you’ve ever met: At one point he or she knew less about software than you do now. If that person seems like a guru, it’s because of years dedicated to learning and refining thought processes. A “guru” is simply a smart person with relentless curiosity.

Of course, there remains a huge variance in natural aptitude. Many hackers out there are smarter, more knowledgeable, and more productive than I may ever be. Even so, debunking the guru myth has a positive impact. For instance, when working with someone smarter than me I am sure to do the legwork, to provide enough context so that person can efficiently apply his or her skills. Removing the guru myth also means removing a perceived barrier to improvement. Instead of a barrier I see a continuum on which I can advance.

Finally, one of software’s biggest obstacles is smart people who purposefully propagate the guru myth. This might be done out of ego, or as a strategy to increase one’s value as perceived by a client or employer. Ironically this attitude makes a smart person less valuable, since they don’t contribute to the growth of their peers. We don’t need gurus. We need experts willing to develop other experts in their field. There is room for all of us.