Too Much Code

Not enough cohesion

Insta-Declarative DSLs!

Recently I’ve focused on retaking rules for developers, where rules aim to be a developer-friendly way to untagle complex logic. Yet some problems call for policy changes without the involvement of developers. We need a simple way to write simple rules.

One approach is to offer a minimal Domain-Specific Language focused on the policy our users need to change. In this post we take a simple example, write a DSL, parse it, validate it, and run it against some data. We’ll use the excellent Instaparse library to define the grammar and create the parse tree, and convert that tree into rules executable with Clara.

First we figure out how we want our DSL to look. To keep things simple, let’s imagine a retail setting and let our business user define promotions and discounts based on customer and order information. An example might look like this:

1
2
3
discount my-discount 15 when customer status is platinum.
discount extra-discount 10 when customer status is gold and total value > 200.
promotion free-widget-month free-widget when customer status is gold and order month is august.

Rule engines are a good fit for declarative DSLs because rule engines themselves are declarative. We can see the rule-like structure in the above example: apply this policy when that set of conditions is true.

Now we need to write a function that converts our friendly DSL into rules we can run. Fortunately, in Clara rules are data, so our function needs to produce a simple data structure rather than generating rules using string manipulation. Using Clojure and Prismatic Schema to define the structure, our function looks like this:

1
2
3
4
5
(s/defn load-user-rules :- [clara.rules.schema/Production]
  "Converts a business rule string into Clara productions."
  [business-rules :- s/Str]
  ;; TODO: convert DSL to Clara rule structures.
  )

So let’s implement it! First we use Instaparse to define our grammar. We can start with the major productions and break their contents. So the discount production would look like this:

1
DISCOUNT = <'discount'> NAME PERCENT <'when'> CONDITION [<'and'> CONDITION]* <'.'>;

And it contains a series of conditions, like this:

1
CONDITION = FACTTYPE FIELD OPERATOR VALUE ;

And so on. Here is the complete grammar we will use, which we simply bring into our Clojure session:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(require '[instaparse.core :as insta])

(def shopping-grammar
  (insta/parser
   "<RULES> = [DISCOUNT | PROMOTION]+
    PROMOTION = <'promotion'> NAME PROMOTIONTYPE <'when'> CONDITION [<'and'> CONDITION]* <'.'>;
    DISCOUNT = <'discount'> NAME PERCENT <'when'> CONDITION [<'and'> CONDITION]* <'.'>;
    <PERCENT> = NUMBER ;
    PROMOTIONTYPE = STRING ;
    <NAME> = STRING ;
    NUMBER = #'[0-9]+' ;
    <STRING> = #'[A-Za-z][A-Za-z0-9_-]+' ;
    CONDITION = FACTTYPE FIELD OPERATOR VALUE ;
    FACTTYPE = 'customer' | 'total' | 'order' ;
    <FIELD> = STRING ;
    OPERATOR = 'is' | '>' | '<' | '=' ;
    <VALUE> = STRING | NUMBER ;
    "
   :auto-whitespace :standard))

The of angle brackets indicate productions to omit from the abstract syntax tree and replace by their children. This isn’t strictly necessary, but simplifies things when transform the tree.

The insta-parser function actually returns a function that converts an input to the syntax tree! So we can just call it with our DSL and pretty-print the results:

1
2
3
4
5
(clojure.pprint/pprint
 (shopping-grammar
  "discount my-discount 15 when customer status is platinum.
   discount extra-discount 10 when customer status is gold and total value > 200.
   promotion free-widget-month free-widget when customer status is gold and order month is august."))

Which produces this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
([:DISCOUNT
  "my-discount"
  [:NUMBER "15"]
  [:CONDITION
   [:FACTTYPE "customer"]
   "status"
   [:OPERATOR "is"]
   "platinum"]]
 [:DISCOUNT
  "extra-discount"
  [:NUMBER "10"]
  [:CONDITION [:FACTTYPE "customer"] "status" [:OPERATOR "is"] "gold"]
  [:CONDITION
   [:FACTTYPE "total"]
   "value"
   [:OPERATOR ">"]
   [:NUMBER "200"]]]
 [:PROMOTION
  "free-widget-month"
  [:PROMOTIONTYPE "free-widget"]
  [:CONDITION [:FACTTYPE "customer"] "status" [:OPERATOR "is"] "gold"]
  [:CONDITION [:FACTTYPE "order"] "month" [:OPERATOR "is"] "august"]])

Just for fun, let’s see what happens when a user mistypes some input. Let’s say “customer” is misspelled when we evaluate the input against our grammar. So running this:

1
2
3
(println
 (shopping-grammar
  "discount my-discount 15 when customeer status is platinum."))

prints out this:

1
2
3
4
5
6
7
Parse error at line 1, column 30:
discount my-discount 15 when customeer status is platinum.
                             ^
Expected one of:
order
total
customer

Great! The error is pretty clear and gives the user options how to fix it. Instaparse does a great job at this.

Alright, let’s get back on track and imagine our user fixed the error. We now have a nice parse tree…we just need to convert it into rules. One way to do this is write a map function that goes through each top-level production and returns a Clara rule. This is a fine approach, and may be a better fit depending on the transformation needed. But in this case I’m going to take advantage of another feature of Instaparse: the ability to apply arbitrary transformations to productions in the tree.

The simplest example is we want to replace productions like [:NUMBER “15”] with…the actual number 15. This tends to be useful for things like, you know, math.

So let’s run a production through our grammar and use the insta/transform function to take a map of transformation for productions. We use Clojure’s threading macro to make wiring functions together more readable:

1
2
3
4
(->> "discount my-discount 15 when customer status is platinum."
     (shopping-grammar)
     (insta/transform {:NUMBER #(Integer/parseInt %)})
     (clojure.pprint/pprint))

This transforms our tree into this, where we have a number rather than an AST production:

1
2
3
4
5
6
7
8
([:DISCOUNT
  "my-discount"
  15
  [:CONDITION
   [:FACTTYPE "customer"]
   "status"
   [:OPERATOR "is"]
   "platinum"]])

So we’ve taken our first step of transforming our tree into an actual, executable rule! Now we need to do some more transformations:

  • :OPERATOR gets transformed to a Clojure comparison function
  • :FACTTYPE gets transformed to a Clojure type. In this case we just use Clojure records.
  • :PROMOTIONTYPE is an enumeration, which we idiomatically transform to a Clojure keyword
  • :CONDITION gets transformed into the left-hand side expression of a rule
  • :DISCOUNT and :PRODUCTION get transformed into actual Clara rules, built on the transformations above! These match the clara.rules.schema/Production Prismatic Schema.

I find it’s best to build this type of logic from the bottom up in a REPL or a REPL-connected editor. Just start with the simplest transformations, like :NUMBER and :OPERATOR, make sure they work in the REPL, then work on the transformations that use them. I also found myself tweaking the grammar to omit or hide unnecessary productions. After a few quick iterations I ended up with this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
(def operators {"is" `=
                ">" `>
                "<" `<
                "=" `=})

(def fact-types
  {"customer" Customer
   "total" Total
   "order" Order})

(def shopping-transforms
  {:NUMBER #(Integer/parseInt %)
   :OPERATOR operators
   :FACTTYPE fact-types
   :CONDITION (fn [fact-type field operator value]
                {:type fact-type
                 :constraints [(list operator (symbol field) value)]})

   ;; Convert promotion strings to keywords.
   :PROMOTIONTYPE keyword

   :DISCOUNT (fn [name percent & conditions]
               {:name name
                :lhs conditions
                :rhs `(insert! (->Discount ~name ~percent))})

   :PROMOTION (fn [name promotion-type & conditions]
                {:name name
                 :lhs conditions
                 :rhs `(insert! (->Promotion ~name ~promotion-type))})})

That’s it! This works because the transformations build on top of lower-level transformations. For instance, the :CONDITION transformation is given a fact-type and an operator because those were transformed by the :FACTTYPE and :OPERATOR transformations, respectively. Users could choose to leave out lower-level transformations and have :CONDITION do all of the work, but the above approach shows the power of this Instaparse feature.

Also note our use of the Clojure syntax quote (`) and unquote (~). These are typically used when writing Macros, but they’re convenient in this case to build expressions that Clara turns into rules. (After all, Clara is really just a big macro that converts user expressions into a rete network!)

Now let’s run this set of transformations against our input data:

1
2
3
4
5
6
(->> "discount my-discount 15 when customer status is platinum.
      discount extra-discount 10 when customer status is gold and total value > 200.
      promotion free-widget-month free-widget when customer status is gold and order month is august."
     (shopping-grammar)
     (insta/transform shopping-transforms)
     (clojure.pprint/pprint))

This produces:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
({:name "my-discount",
  :lhs
  ({:type clara.examples.insta.Customer,
    :constraints [(clojure.core/= status "platinum")]}),
  :rhs
  (clara.rules/insert!
   (clara.examples.insta/->Discount "my-discount" 15))}
 {:name "extra-discount",
  :lhs
  ({:type clara.examples.insta.Customer,
    :constraints [(clojure.core/= status "gold")]}
   {:type clara.examples.insta.Total,
    :constraints [(clojure.core/> value 200)]}),
  :rhs
  (clara.rules/insert!
   (clara.examples.insta/->Discount "extra-discount" 10))}
 {:name "free-widget-month",
  :lhs
  ({:type clara.examples.insta.Customer,
    :constraints [(clojure.core/= status "gold")]}
   {:type clara.examples.insta.Order,
    :constraints [(clojure.core/= month "august")]}),
  :rhs
  (clara.rules/insert!
   (clara.examples.insta/->Promotion
    "free-widget-month"
    :free-widget))})

Now we have a sequence of rules we can run! We can pass this directly into the mk-session function and create an actual rule session!

We can also combine these rules with others written by Clara’s defrule or generated from some other source. You can see the full code in the clara.examples.insta namespace in the clara-examples project, but here is the pertinent segment for running our rules:

1
2
3
4
5
6
7
8
9
10
(let [session (-> (mk-session 'clara.examples.insta (load-user-rules example-rules))
                  (insert (->Customer "gold")
                          (->Order 2013 "august" 20)
                          (->Purchase 20 :gizmo)
                          (->Purchase 120 :widget)
                          (->Purchase 90 :widget))
                  (fire-rules))]

  (clojure.pprint/pprint (query session get-discounts))
  (clojure.pprint/pprint (query session get-promotions)))

Running this produces the following output:

1
2
({:?discount {:name "extra-discount", :percent 10}})
({:?discount {:reason "free-widget-month", :type :free-widget}})

And that’s it! The complete code for this is in clara-examples. Details on Instaparse can be found on the Instaparse github page and Clara documentation is at clara-rules.org. You can also reach me on twitter @ryanbrush.

Finally, we once again see how powerful Clojure’s composable design is. The Instaparse and Clara libraries were built completely independently, but since both use functional transformations of immutable data structures we were able to combine them to create something useful in a small amount of code. Plus hacking on this stuff is just plain fun.

UPDATE: I posted an answer to a follow up question to this thread in the Clara Google group. If there are other topics, please feel free to use that thread or create a new one.

Micro Benchmarks versus Macro Optimizations in Clojure

Here’s a simple question Clojure users hear often:

What is the overhead of Clojure’s persistent data structures?

I ran into this question headlong when profiling and tuning Clara. Clara aims to draw ideas from expert systems into the Clojure ecosystem, making them available as first-class Clojure idioms. Therefore Clara’s working memory works like other Clojure data structures: it is immutable and “updates” create a new, immutable working memory.

Anyone skeptical of the benefits of immutability should go watch Rich Hickey’s talks like Simple Made Easy. Yet these advantages are irrelevant if they don’t perform well enough. So we have a challenge: we know that persistent Clojure structures will lose a micro benchmark comparison to mutable counterparts, but can we balance that with macro optimizations made possible with immutability? The answer is yes, with the techniques below working for Clara and probably for many other projects.

Optimizing Clojure Code

Optimizations should have an objective. My objective with Clara was to make performance at least competitive with latest version of Drools, which may be used to solve similar problems. Clara’s basis in Clojure offers a number of advantages, but we need to make sure performance isn’t a barrier. So I created the clara-benchmark project, using Criterium to benchmark a number of flows in both Clara and Drools. Some findings:

It’s All About the Algorithms

The first run of profiling didn’t look good. Clara was almost ten times slower than Drools for some cases. But it turns out the bulk of this cost had nothing to do with Clojure — my variation of the Rete algorithm was inefficient, indexing facts for join operations that could never occur due to the nature of the rules. In short, my algorithm sucked.

The good news this was easily exposed with a profiling tool and fixed with little effort. I find algorithms in a language like Clojure to be easier to understand and tune because they are expressed as simple transformations of data. We know the structures we receive and return, and simply need to identify the most efficient way to express the transformation. This is a major contrast to systems that force us keep track of a bunch of additional state when working through the system.

A better algorithm was the biggest single improvement, bringing Clara within twice Drools performance or better for the use cases tested. But we’re not done yet.

Strongly Isolated Mutability

A better algorithm got us close, but further profiling revealed a bottleneck for some use cases. Rete engines often perform joins over common facts, like this simple example:

1
2
3
4
5
(defquery orders-by-customer-id
  "Returns the orders for the given customer."
  [:?id]
  [Order (= ?id customerId) (= ?total total)]
  [Customer (= ?id id)])

This queries the working memory to simply find customer and orders with the same id. (See the Clara documentation for details on use.)

Clara makes extensive use of Clojure’s group-by function to group collections of facts by matching keys. After tuning my algorithm, I discovered that some benchmarks were spending the bulk of their time in group-by. The group-by implementation can be found in the Clojure source, but here it is for convenience:

1
2
3
4
5
6
7
8
9
10
11
12
13
(defn group-by
  "Returns a map of the elements of coll keyed by the result of
  f on each element. The value at each key will be a vector of the
  corresponding elements, in the order they appeared in coll."
  {:added "1.2"
   :static true}
  [f coll]
  (persistent!
   (reduce
    (fn [ret x]
      (let [k (f x)]
        (assoc! ret k (conj (get ret k []) x))))
    (transient {}) coll)))

Notice the use of Clojure transients, which are mutable structures designed to be used locally for efficiency. Clojure takes a pragmatic step here. The goal is to keep our systems easy to reason about, and we can achieve that if no external observer can detect mutability. group-by is a pure function working with immutable structures for all observers, but gains performance by using transients internally.

The trouble I ran into with my benchmarks is that I had many items mapping to the same key. Notice that Clojure’s group-by uses a transient map, but that map contains non-transient vectors. So the performance bottleneck arose because this group-by function wasn’t “transient enough” for my particular data.

I worked around this by writing an alternate group-by that better fit my needs. Its internals are hideous but are the result of profiling a couple implementations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(defn tuned-group-by
  "Equivalent of the built-in group-by, but tuned for when
   there are many values per key."
  [f coll]
  (->> coll
       ;; Create a mutable map of transient vectors.
       (reduce (fn [map value]
                 (let [k (f value)
                       items (or (.get ^java.util.HashMap map k)
                                 (transient []))]
                   (.put ^java.util.HashMap map k (conj! items value)))
                 map)
               (java.util.HashMap.))
      ;; Make the vectors immutable into a transient map.
      (reduce (fn [map [key value]]
                  (assoc! map key (persistent! value)))
                (transient {}))
      ;; Make the map itself immutable.
      (persistent!)))

This is more efficient when there are many items that map to the same key in the returned map, since it uses transient values. (The Java HashMap turned out to be the fastest option to build the result here, but it never escapes this function.) This optimization cut some benchmark times in half. Combined with a number of smaller tweaks, this brought most of Clara’s use cases inline with Drools performance. For other use cases Clara significantly outperforms Drools, but we’ll get to those later.

My tuned-group-by function is faster than Clojure’s group-by for some inputs and slower for others. But this misses a bigger advantage: Clojure’s philosophy of separating functions and data made swapping implementations trivial, allowing users to pick the right ones for their specific needs. This isn’t so easily done if functions are strongly tied to the data they work with, which is an easy pitfall of object-oriented programming.

Referential Transparency Breeds Serendipity

Writing functional code tends to create pleasant surprises. We come across significant advantages that wouldn’t be possible with a different approach. Considering the following Clara example and it’s Drools equivalent:

1
2
3
4
5
6
7
8
9
10
(ns clara.benchmark.visit-order-same-day
  (:require [clara.rules :refer :all]
            [clj-time.coerce :as coerce])
  (:import [clara.benchmark.beans Order Visit]))

(defquery same-day-visit
   "Queries orders that occurred the same day as a visit."
   []
   [Order (= ?id customerId) (= ?day (coerce/to-local-date time))]
   [Visit (= ?id customerId) (= ?day (coerce/to-local-date time))])

Drools equivalent:

1
2
3
4
5
6
7
8
9
10
11
12
13
package clara.benchmark.drools.visit_order_same_day;

import clara.benchmark.beans.Order;
import clara.benchmark.beans.Visit;

import org.joda.time.DateTime;

query "same_day_visit"

   Order($id : customerId, $day : time.toLocalDate())
   Visit($id == customerId, $day == time.toLocalDate())

end

These rules simply identify things happening on the same day. Yet there is a big difference: Drools does an O(n2) Cartesian join where Clara does an O(n*log32n) indexed join of each item. Therefore Clara becomes dramatically faster than Drools for large inputs in this case. Also notice how Clara cleanly integrates with Clojure’s syntax, as opposed to embedding multiple syntaxes into a file, since it treats rules as a control structure.

This is possible because of Clojure’s emphasis on pure, referentially transparent functions. Since we can replace the function call with its result, we can build an index of that result. The outcome is a significantly more efficient system for this class of problem.

Along the same lines, rule engines facilities to reason over sets of facts can be implemented more efficiently under these constraints. Clara’s equivalent of Jess and Drools accumulators simply compile into Clojure reducers, making them more efficient than the alternatives by simply tapping into that feature.

These advantages arise often: we can defer computation to efficient batch operations. We can transparently spread work across threads without dealing with lock-based concurrency. We can memoize functions or build efficient caches based on fast reference comparison. Importantly, when starting a problem it’s not always obvious how these advantages will arise, but these techniques provide an opportunity for great optimizations at the macro level. David Nolen’s work on Om is a dramatic example of this in action.

The Trouble with Benchmarks

X is faster than Y makes for a great incendiary headline on Hacker News, but it doesn’t really make sense. X may be faster than Y for workload Z with trade offs A, B, and C…but for some reason those headlines don’t get as many upvotes.

Benchmarks are informative and an important tool to improve our system. But they aren’t a real measure of a system’s quality or potential. A better measure is how easily a system can be understood, adapted, and expanded. If we can understand the nature of a problem, performance or otherwise, we can usually fix it. Clojure simply provides better mechanisms to understand and improve our systems than other languages I’ve used.

In short, the trouble with benchmarks is they encourage treating symptoms rather the than the explosion of complexity that limits what we can build.

Clara’s Future

All optimizations discussed here are in master and will be released in Clara 0.6.0 this summer. You can see some current comparisons with Drools in the clara-benchmark project. There are still opportunities for improvement in Clara, being a relatively new system. Probably the next significant optimization is greater laziness, which we’re tracking here.

Updates will be posted here and on my twitter feed. I’ll also be discussing modern approaches to expert systems, including Clara, at two conferences over the next few months: Midwest.io in July and strangeloop in September.

Clara 0.4 Released

Clara 0.4 has been released, and is available on Clojars! See the github page for usage information. I wrote about the significant features in the Rules as Data post.

The release announcement is in the Clojure Google Group.

This release puts Clara on a strong footing, and I’m looking forward to playing with the new rules-as-data features.

Rules as Data

This started as a topic for a Lambda Lounge meetup but seems worth sharing broadly.
I’ve posted previously about treating rules as a control structure, but for Clara 0.4 rules are now first-class data structures with well-defined schema.  The simple defrule syntax is preserved but the macro now produces a data structure that is used everywhere else. For example, the following code:
1
2
3
4
5
(defrule date-too-early
  "We can't schedule something too early."
  [WorkOrder (< (weeks-until date) 2)]
  =>
  (insert! (->ApprovalRequired :timeline "Date is too early")))
Defines a var containing this structure:
1
2
3
4
5
6
7
8
{:doc "We can't schedule something too early.",
 :name "date-too-early",
 :lhs
 [{:constraints [(< (llkc.example/weeks-until date) 2)],
   :type llkc.example.WorkOrder}],
 :rhs
 (clara.rules/insert!
  (llkc.example/->ApprovalRequired :timeline "Date is too early"))}
The rule structure itself is defined in Clara’s schema, and simply stored in the var with the rule’s name.

So, now that rules are data, we open the door to tooling to explore and manipulate them. For instance, we can visualize the relationship between rules. Here is a visualization of the rules used for the meetup. I arbitrarily chose shapes to distinguish the different types of data:



This image is generated by running the following function from clara.tools.viz against the example used at the meetup:
1
(viz/show-logic! 'llkc.example)
That function simply scans each rule in the given namespace, reads individual conditions from the rule data structure, and wires them up. The graph itself is rendered with GraphViz.

Since all rules of data, they can be filtered or manipulated like any Clojure data structure. So here we take our set of rules and only display those that handle logic for validation errors:
1
2
3
(viz/show-logic!
  (filter #(viz/inserts? % ValidationError)
    (viz/get-productions ['llkc.example])))
In this example there is only one rule that does validation, so the resulting image looks like this:

Rule engine as an API

The rules-as-data model creates another significant advantage: we have decoupled the DSL-style syntax from the rule engine itself. Clara users can now create rules from arbitrary sources, such as a specialized syntax, an external database, or domain-specific file format. (Think instaparse tied to rule generation.) Clara is evolving into a general Rete engine, with its “defrule” syntax being just one way to use it.

So far I’ve written only simple, GraphViz-based visualizations but we can expose these with more sophisticated UIs. Projecting such visualizations onto, say, a D3-based graph could provide a rich, interactive way of exploring logic.

At this point, Clara 0.4 is available as snapshot builds. I expect to do a release in February, pending some cleanup and enhancements to ClojureScript support. I’ll post updates on my twitter feed, @ryanbrush.

Crossing the (data) streams: scalable realtime processing with rules

Pipe-and-filter architectures are among the most successful design patterns ever. They dominate data ingestion and processing today, and give 1970’s hackers yet another chance to remind us how they thought of everything years ago.

Unfortunately modern variations of this can run into an ugly problem: what are the semantics of a “join” operation between multiple infinite streams of data? Popular systems like Storm point out this ambiguity, as does Samza. Both provide primitives to support correlation of events between streams, but the higher-level semantics of a join are punted to the user.

This is fine for such systems providing infrastructure, but is a troublesome model of our applications: if we can’t define clear and compelling semantics for a key part of our processing model, we might be using the wrong model. Projects like Storm offer an excellent infrastructure, but this ambiguity implies that many problems could be solved with a higher-level abstraction.

The challenge with user-defined join semantics is it comes with baggage: maintaining state, structuring it, and recovering state after failure scenarios are challenging problems. It also makes the behavior of the system harder to understand. Since each join can have slightly different behavior, we need to look closely to see what’s going on. A better approach is needed. A set of simple yet flexible join operators would be ideal – so how do we get there?

If we can’t define clear and compelling semantics for a key part of our processing model, we might be using the wrong model.
We might consider CEP-style systems such as Esper and Drools Fusion, which have been tackling declarative-style joins for years. But such systems don’t offer the scalability or processing guarantees of Storm, and they use limited languages that aren’t always expressive enough for sophisticated logic.

We need a data processing model with well-defined, declarative joins while supporting rich application logic. There are lots of options here, but here I’ll focus on one: suppose we could make rules as a control structure scale linearly across a cluster, letting the rules engine distribute join operations. Let’s look at an experiment of making Clara, a Clojure-based rules engine, distribute its working memory and processing across a Storm cluster, with all of the scalability and processing guarantees of the underlying system.

Forward-chaining rules on Storm

Imagine a variant of the Rete algorithm implemented with some simple constraints:

First, each condition in a rule can be evaluated independently, so incoming data can be spread across an arbitrary number of processes and match rule conditions appropriately.

Second, aggregations over matched facts follow a map-and-reduce style pattern – where the map and partial reductions of aggregated facts can be done in parallel across machines.

Finally, “joins” of aggregations or individual facts are always hash-based. So joins can be efficiently achieved by sending matched facts to the same node via their hash values.

The result is our Storm topology looks something like this:


Let’s consider a simple example. Suppose we have feeds of temperature readings from multiple locations in some facility, and we want to take action in those locations should our readings exceed a threshold.

Each temperature reading has a logical timestamp, so since our logic is interested in the “newest” reading, we use a Clara accumulator that selects the item with the newest timestamp:
1
(def newest-temp (acc/max :timestamp :returns-fact true))
We then use it in a rule that processes all our readings for a location and preserves the newest:
1
2
3
4
5
6
(defrule get-current-temperature
  "Get the current temperature at a location by simply 
   looking at the newest reading."
  [?current-temp <- newest-temp :from [TemperatureReading (== ?location location)]]
  =>
  (insert! (->CurrentTemperature (:value ?current-temp) ?location)))
Note that accumulators preserve minimal state and apply changes incrementally. In this case we keep only the current temperature based on timestamp; lower values are simply discarded, so we can deal with an infinite stream.  Also, this example keeps the maximum, but we could easily accumulate some other value, such as a time-weighted histogram of temperatures to we’re robust to outliers. Any fact that doesn’t match a rule is simply discarded, incurring no further cost.

Now that we have the current temperature for each location, we want to back off our devices in those locations if a threshold is exceeded. We can write this as a simple rule as well:
1
2
3
4
5
6
7
8
(defrule reduce-device-speed
  "Reduce the speed of all devices in a location that has a high temperature."
  [CurrentTemperature (> value high-threshold)
                      (= ?location-id location)]
  ;; Find all Device records in the location, and bind them to the ?device variable.
  [?device <- Device (= ?location-id location)]
  =>
  (reduce-speed! ?device))
This first condition matches current temperatures that exceed the threshold, and binds it the ?location-id variable.  The second condition finds all devices with a matching location, and binds them to the ?device variable.  This is then visible on the right-hand side of the rule, where we can take action.

This is effectively performing a join between temperatures that exceeded a threshold at a given location and devices in that same location. When running over Storm, this rule wish hash Device and CurrentTemperature facts and send them to the same processing using a hash value. This is done using Storm’s group-by field functionality over a data stream that connects each bolt instance together.

All state for the join operations are managed internally by Clara’s engine. Accumulators like the example here compute in a rolling fashion, merging new data together, retracting previously accumulated values, and inserting new ones. Combined with rule engine-style truth maintenance, developers can simply declare their logic and let the engine maintain state and consistency.

Integration with Processing Topologies

The rules used in this example are here, and are run with the Storm launching code here. There is also a draft Java API to attach rules to a topology. Note that our approach is to simply attach to a Storm topology defined via a provided TopologyBuilders, so users can pre-process or perform other logic in their topology, and route data as appropriate into the distributed rule engine. Also, these examples use Clojure records, but they can work equally well with Java objects, including ones generated by Thrift, Avro, or Protocol Buffers.

Current State

A prototype of rules over Storm is in the clara-storm project. It also includes the abilities to run queries across the rule engine’s working memory, using Storm’s Distributed RPC mechanism. A handful of things need to come together to make this production ready, inluding:

  • I’d like input and suggestions from members of the Storm community. This topology layout isn’t an idiomatic use of Storm, so we need to ensure this won’t run into problems as we scale.  (This is one of the reasons I’m posting this now.)
  • The ability to persist Clara’s working memory to recover from machine failures. This will probably take the form of writing state changes for each rule node to reliable write-ahead log, with Kafka being a good storage mechanism.
  • Optimizations ranging from efficient serialization to doing partial aggregations prior to sharing state between bolts are needed.
  • Consider temporal operators in Clara. Accumulators have worked well to this point but may run into limits.
  • Testing at scale!
The biggest takeaway is how technologies like Storm and Clojure provide an opportunity to express computation with higher-level abstractions. Things like SummingBird (and Cascalog 2.0?) offer ways to query data streams. These could be complemented by support for forward-chaining rules for problems easily expressed that way.

Rules as a Control Structure

Rule engines seem to draw love-or-hate reactions. On one hand they offer a simple way to manage lots of arbitrary, complex, frequently-changing business logic. On the other, their simplicity often comes with limitations, and edge cases pop up that can’t be elegantly solved in the confines of rules. There are few things more frustrating than a tool meant to help you solve problems actually creates them.

The tragedy is that excellent ideas for modeling logic with rules have been hijacked by a myth: that it’s  possible to write code – to unambiguously define logic in a textual form – without actually writing code. We see authoring tools generating rules in limited languages (or XML!), making the case that domain experts can author logic without development expertise. The shortage of good developers makes this tremendously appealing, and this demand has drawn supply.

If you have a limited problem space satisfied by such tools, then great. But problems remain:

  • Limited problem spaces often don’t stay limited.
  • Many problems involving arbitrary domain knowledge are best solved with rules when we can, but require the ability to integrate with a richer programming environment when we must.
So how do we approach this? We need to stop thinking of rule engines as external systems that create artificial barriers between our logic, but as first-class constructs seamlessly integrated in the host language.  In other words, rules engines are best viewed as an alternate control structure, suited to the business problem at hand.

Clojure is uniquely positioned to tackle this problem. Macros make sophisticated alternate control structures possible, Clojure’s rich data structures make it suitable for solving many classes of problems, and its JVM integration makes it easy to plug into many systems. This is the idea behind Clara, a forward-chaining rules implementation in pure Clojure.

Here’s an example from the Clara documentation.  In a retail setting with many arbitrary frequently promotions, we might author them like this:
1
2
3
4
5
(defrule free-lunch-with-gizmo
  "Anyone who purchases a gizmo gets a free lunch."
  [Purchase (= item :gizmo)]
  =>
  (insert! (->Promotion :free-lunch-with-gizmo :lunch)))
And create a query to retrieve promotions:
1
2
3
4
(defquery get-promotions
  "Query to find promotions for the purchase."
  []
  [?promotion <- Promotion])
All of this is usable with idiomatic Clojure code:
1
2
3
4
5
6
7
(-> (mk-session 'clara.examples.shopping) ; Load the rules.
    (insert (->Customer :vip)
            (->Order 2013 :march 20)
            (->Purchase 20 :gizmo)
            (->Purchase 120 :widget)) ; Insert some facts.
    (fire-rules)
    (query get-promotions))
The resulting query returns the matching promotions. More sophisticated examples may join multiple facts and query by parameters; see the developer guide or the  clara-examples project for more.

Each rule constraint and action – the left-hand and right-hand sides – are simply Clojure expressions that can contain arbitrary logic. We also benefit from other advantages of Clojure. For instance, Clara’s working memory is an immutable, persistent data structure. Some of the advantages of that may come in a later post.

Rules by domain experts

So we’ve broken down some traditional barriers in rule engines, but it seems like this approach comes with a downside: by making rules a control structure in high-level languages, are we excluding non-programmer domain experts from authoring them?

We can expand our rule authoring audience in a couple ways:

  1. Encapsulate rules into their own files editable by domain experts, yet compiled into the rest of the system. An audience savvy enough to work with, say, Drools can understand the above examples and many others.
  2. Generate rules from higher-level, domain-specific macros. Business logic could be modeled in a higher-level declarative structure that creates the rules at compile time. Generating rules is actually simpler than most logic generation, since rule ordering and truth maintenance are handled by the engine itself.
  3. Tooling to generate rules directly or indirectly. Like all Lisp code, these rules are also data structures. In fact, they are simpler to work with than an arbitrary s-expression because they offer more structure: a set of facts used by simple constraints resulting in an action, which could also contribute more knowledge to the session’s working memory. 
Ultimately, all of this results in Clojure code that plugs directly into a rich ecosystem. 

These options will be fun to explore, but this isn’t my initial target. Let’s first create a useful tool for expressing complex logic in Clojure. Hopefully this will become a basis for exploration, borrowing good ideas for expressing business rules and making them available in many environments via the best Lisp available.

If this seems promising, check out the Clara project on github for more. I’ll also post updates on twitter at @ryanbrush.

Update: see the Clojure Google Group for some discussion on this topic.

A long time ago, we used to be friends

No one ever intends to let their blogs go dark, but here we are.  I’m planning on dusting off this blog to write about some new personal projects, but for now I thought I’d link to writing I’ve done in other venues:

I’ve written a few posts for Cerner’s Engineering blog as part of my work at Cerner. These include:
Some of this content is from talks that I’ve given at Hadoop World, ApacheCon, and StampedeCon.

Some ideas originally posted to this blog have been published in the book 97 Things Every Programmer Should Know.

My latest personal project has been the construction of a forward-chaining rules engine in Clojure. I expect this and related problems to be the emphasis on this blog moving forward.

Start by embracing your limits

Nothing in human history has offered more promise but has seen as many failures as software. We’ve all seen moments of greatness, where a program seems like magic – but such gems are surrounded by minefields of bugs and indecipherable interfaces.

The result of all this is we programmers are often a frustrated bunch. But should we be? After all, what makes us think that as a species we should have the aptitude to create great software? Our skill sets evolved in an environment that favored those who could hunt boar and find berries – any capacity to succeed in the abstract world of software is pure, accidental side effect. Perhaps we should be awed by software’s successes rather than frustrated by its failures.

The good news is we can improve despite our limitations, and it starts with this: accept that we generally have no innate ability to create great systems, and design our practices around that. It seems like every major step forward in software has followed this pattern of embracing our limitations. For instance, we move to iterative development since we can’t anticipate all variables of a project. We aggressively unit test because we realize we’re prone to error. Libraries derived from practical experience frequently replace those built by expert groups. The list goes on.

This type of admission is humbling, but it can also be liberating. Here’s an example: In years past I would spend hours agonizing over an internal design decision for a system I was building. I figured if I got it right we could easily bolt on some new feature. Sometimes I was right, but often times I was not. My code often was littered with unnecessary structure that only made things more complicated.

Contrast that to today: I know I can’t anticipate future needs in most cases, so I just apply this simple heuristic:
  1. When in doubt, do the simplest thing possible to solve the problem at hand
  2. Plan on refactoring later.
The first step frees us from trying to anticipate all future needs – but this is not quick and dirty cowboy coding. An essential element of code is to create an understandable and maintainable system. Don’t try to code for future needs. Instead, structure code for present needs so it can be leveraged in the future.

So how do we do this? A couple things to keep in mind:
  • When in doubt, leave it out. (also known as “You Ain’t Gonna Need It”)
  • Unit-Testable designs tend to be reusable designs. Unit tests not only catch bugs that can result from refactoring, but they encourage modularity to enable that refactoring.
  • Don’t try to design an API independently of an application. You won’t understand your users’ needs well enough to create a good experience. Build the API as part of the application to make sure its needs are met, then factor out and generalize.
  • Group code together that tends to change for similar reasons. If your Widget class is responsible for rendering its UI and writing to the database and business logic, you can’t use it anywhere else without significant changes. High cohesion and loose coupling.
There are no hard-and-fast rules to building software, but we all need a compass to help guide us through the thousands of micro-decisions we make every time we write code. Hopefully this post can help build that compass.