Solving regex crosswords
Regex Crossword is a puzzle game to help you
practice regular expressions. I wrote a program to solve them. You can find it
on GitHub as lvh/regexcrossword
. If
you're on amd64 Linux, you can try the demo binary too. This blog post
walks you through how I wrote it using logic programming.
To me this game feels more like Sudoku than a crossword. When you solve a crossword, you start by filling out the words you're certain of and use those answers as hints for the rest. Backtracking is relatively rare. In a sudoku, you might start by filling out a handful of certain boxes, but in most puzzles you quickly need to backtrack. That distinction shapes how I think of solving it: searching and backtracking is a natural fit for logic programming.
Logic programming is a niche technology, but in its sweet spot it's miraculous. I'll introduce the concept to help you recognize what sorts of problems it's good at. Worst case, you'll enjoy a cool hack. Best case you'll get the satisfaction of writing a beautiful solution to a problem some time in the future.
My programming language of choice is Clojure, and Clojure has
clojure/core.logic
, a logic programming library. Because logic
programming is such a specialized tool, it's more useful to have libraries that
let you "drop in to" logic programming than to do everything in a fullblown
logic programming language.
Approach
It would not be a bad idea to read the instructions for Regex Crossword. I'll also assume you have seen some basic regular expressions. Hopefully the Clojure will be piecemeal enough you can just follow along, but reading a tutorial wouldn't hurt.
Let's take a look at the first beginner puzzle:
Each regex part applies to some number of unknowns (empty boxes). The first row
regex HELLO+
applies to the unknowns of the first row. The first column
regex [^SPEAK]+
applies to the unknowns of the first column. Both constrain
the top left unknown: we're counting on our logic programming engine at being
convenient for expressing that "cascading" of constraints.
An unknown is just a logic variable in constraint logic programming. For brevity we'll call them "lvars" ([ell vars]).
Rows, columns (and in later puzzles hexagon lines) are all just layout. Fundamentally it's all just a regex applying to some lvars.
Breaking down regular expressions
Most regular expressions have structure, like (ABCD)XY
. We'll solve this
problem recursively: if (ABCD)XY
matches lvars p, q, r, s
, then presumably
pq
must match ABCD
and rs
must match XY
.
(There are some counterexamples to the idea that we can solve the entire problem recursively! For example, backrefs within a regex would require a second pass, and backrefs across regular expressions would require another toplevel pass. We'll deal with those later. There are ties to language theory here, but that's a story for another day.)
We'll parse the regular expression into a data structure that's easier to handle. Fortunately, there's a piece of code out there that knows how to generate strings matching a given regex, which has a parser we can reuse. That parser is designed for Java's regular expression syntax. It's not quite identical to that of JavaScript, but we're hoping that the puzzles avoid those tricky edge cases for now.
To parse, we use [com.gfredericks.test.chuck.regexes :as cre]
.
(cre/parse "HELLO+") ;; => (read: produces) {:type :alternation, :elements ({:type :concatenation, :elements ({:type :character, :character \H} {:type :character, :character \E})} {:type :concatenation, :elements ({:type :character, :character \L} {:type :character, :character \L})} {:type :concatenation, :elements ({:type :repetition, :elements [{:type :character, :character \O}], :bounds [1 nil]})})}
Progress! It is indeed an alternation (a or b) of a concatenation of H and E, or L and L, or the letter O one or more times.
Logic machinery
The logic programming engine is going to search a tree of possibilities for solutions that fit your constraints. Logic programming is inherently declarative: instead of telling the computer how to find the answer, we describe what the answer looks like.
An lvar that doesn't have a value assigned to it yet (it could still be anything) is called "fresh". An lvar with a definite value is called "bound". There is nothing preventing your program from returning fresh variables: that just means the answer that doesn't rely on what that lvar taking any particular value (like the x in 0x = 0).
In core.logic, a program consists of a series of expressions called goals.
(l/== a b)
is a goal to make the values of a
and b
equal. It does not, by
itself, compare a
to b
or assign anything to anything. It just expresses the
idea of comparing the two. Logic programming calls this "unification".
As the engine searches the space of possible answers, sometimes a
will be
equal to b
already, or a
will be bound and b
will be fresh in which case
b
will become bound to whatever a
was bound to. Either way, the two
variables can be unified and the goal is said to succeed. But in many parts
of the tree this doesn't work out: a
and b
will already be bound to
incompatible values. In that case, the goal is said to fail. Unification is just
one of many goals, but it's the most important one in most programs including
ours.
Finally, we need a way to actually run the logic programming engine. that's
clojure.core.logic/run
's job. Something like:
(l/run 1 [q] ;; run to find up to one answer with logic vars [q] (l/== q 'fred) ;; where q is unified with 'fred
... will return ('fred)
(the 1list with the symbol 'fred
in it) because
there's only one answer for q
that makes all the goals (here just one goal)
succeed. run
takes the maximum number of answers as a parameter. Since
you're describing what the answer looks like, there might be zero, one, or any
other number of answers. Sometimes the engine will be able to prove there are no
other options (because the search was exhaustive) and it'll return fewer. Some
programs run forever, some so long it might as well be forever. run
has a
sibling run*
that gets you all the answers. It returns the same result,
because there's only one value for q
that makes all goals succeed.
Character match
The simplest possible regular expression is just a character, which matches
itself: A
. In our parse tree, this is an entry of :type
:character
. We'll
write a multimethod dispatching on :type
to make this work so we can implement
other types later.
(defmulti re>goal :type)
A character looks like this: {:type :character, :character \L}
. We'll use
destructuring to extract the :character
key. In general, this multimethod will
take multiple lvars, though incidentally in the case of :character
it'll
be just one, so we can destructure it as well.
(defmethod re>goal :character [{:keys [character]} [lvar]] (l/== character lvar))
We'll write a test to verify this works.
(t/deftest re>goalcharactertest (t/is (= '(\A) (l/run* [q] (rcl/re>goal {:type :character :character \A} [q])))))
Alternation
Let's look at a few parse trees for some simple alternations:
(cre/parse "AB") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :character, :character \A})} {:type :concatenation, :elements ({:type :character, :character \B})})}
Here's an example where the options are different length.
(cre/parse "AAAB") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :character, :character \A} {:type :character, :character \A} {:type :character, :character \A})} {:type :concatenation :elements ({:type :character, :character \B})})}
It's also probably a good idea to consider something with more than two alternatives:
(cre/parse "ABC") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :character, :character \A})} {:type :concatenation, :elements ({:type :character, :character \B})} {:type :concatenation, :elements ({:type :character, :character \C})})}
Note the parser will always create concatenations as elements of alternations even if the element is really just a single character. We can't handle concatenations yet, so while we generally prefer things that the parser actually produces, we'll cheat with an alternation of characters first.
Alternation in logic
The obvious test to write is:
(t/deftest re>goalalternationtest (t/is (= '(\A \B) (l/run* [q] (rcl/re>goal (cre/parse "AB") [q])))))
... but as we saw above, that introduces concatenations. Instead:
(t/is (= '(\A \B) (l/run* [q] ;; "AB" with the internal concatenations removed (rcl/re>goal {:type :alternation :elements [{:type :character :character \A} {:type :character :character \B}]} [q])))) (t/is (= '(\A \B \C) (l/run* [q] ;; "ABC" with the internal concatenations removed (rcl/re>goal {:type :alternation :elements [{:type :character :character \A} {:type :character :character \B} {:type :character :character \C}]} [q]))))
We need a way to express disjunction. Like l/and*
, there's a l/or*
. Once we
have that, alternation is straightforward:
(defmethod re>goal :alternation [{:keys [elements]} lvars] (l/or* (map #(re>goal % lvars) elements)))
This primer is a little unorthodox because we're walking through a problem that
requires rule generation. Most introductory texts make you build up static
rules. Disjunction is usually expressed with the macro l/conde
. One difference
is that conde
can express a disjunction of conjunctions in one go:
(conde [a b] [c]) ;; is equivalent to (or* [(and* a b) c])
Concatenation
We want to be able to solve squares, not just individual letters. The simplest
case is a concatenation of two characters, like AA
.
(def a {:type :character :character \A}) (def aa {:type :concatenation :elements [a a]}) (t/deftest re>goalconcatenationtest (t/is (= '((\A \A)) (l/run* [p q] (rcl/re>goal aa [p q])))))
A simple implementation passes this test:
(defmethod re>goal :concatenation [{:keys [elements]} lvars] (l/and* (for [e elements v lvars] (re>goal e [v]))))
(l/and*
is a function that returns a goal that succeeds when all its goals
succeed. We'd use the macro version l/all
if we were explicitly writing out
our goals. Because we're generating them, it's easier to use a function.)
We're clearly abusing the fact that we happen to know that the parts are characters and therefore length 1. But the parts could be repetitions, or an alternation between concatenations each of length >1.
Distributing lvars over groups
Next, we need to find the ways lvars can be distributed over the individual elements. We know that each lvar represents a square and each square must be assigned, so we know that the sum of the lengths of the elements must be the total number of lvars.
We can actually solve this subproblem using logic programming as well! We'll use CLP(FD), which means "constraint logic programming over finite domains".
;; [clojure.core.logic.fd :as f] (l/run* [q] (l/fresh [x y z] (l/== [x y z] q) (f/in x y z (f/interval 2 5)) (f/eq (= 10 (+ x y z))))) ;; => ([2 3 5] [3 2 5] [2 4 4] [2 5 3] [4 2 4] [3 3 4] ;; [3 4 3] [5 2 3] [4 3 3] [3 5 2] [4 4 2] [5 3 2])
As mentioned before, the approach we've taken to this problem is unique because
we're generating rules. Most logic programs have a static structure and
dynamic data, while our data informs the structure itself. We need to know how
many groups there are to know how many lvars to create with fresh
. f/eq
is
itself a convenience macro that rewrites "normal" Lispstyle equations (like (=
10 (+ x y z))
) to a form the logic engine knows how to solve directly.
One way to solve that problem is with macros and eval
, but that might impede our
ability to port to e.g. ClojureScript and nativeimage. Plus, I wanted to learn
more about how the finite domain magic works internally.
The main thing f/eq
does here is introduce intermediate variables. The
underlying goal f/+
only takes 3 arguments: two summands and a sum result. To
express a sum over many summands you need intermediate "running total" lvars. I
wrote that:
(defn reduceo "Given a binary operator goal, return an nary one. If the given goal has shape `(⊕ x y z)` meaning `x ⊕ y = z`, `(reduceo ⊕ z vars)` computes `z = vars[0] ⊕ vars[1] ⊕ vars[2]...`." ;; lower case are input vars (a, b, c, d...) ;; upper case are intermediate accumulator lvars ("running totals") ;; Ω is the final result ;; a ⊕ b = A ;; A ⊕ c = B ;; ... ;; W ⊕ y = X ;; X ⊕ z = Ω ;;    ;;   \_ (concat accumulators (list result)) ;;  \_ (rest lvars) ;; \_ (cons (first lvars) (butlast accumulators)) ;; ;; There are two fewer accumulators than there are input lvars. The middle ;; equations all use 2 accumulators each: one to carry the previous result and ;; one to carry it to the next equation. The first equation uses 2 input vars ;; and 1 accumulator, and the last equation uses 1 input var, 1 accumulator ;; and the result (which may be an lvar, but it's not _our_ lvar  and in ;; common uses we expect it to be a normal value). ;; ;; We don't need the butlast because map will only use the shortest coll argument. [binop result lvars] (let [results (> lvars count ( 2) (repeatedly l/lvar) (conj result) reverse) lefts (cons (first lvars) results) rights (rest lvars)] (l/and* (map binop lefts rights results))))
This lets us define a sum goal and a helper function that runs the logic engine for us to find all the ways we can sum up to a number, with some bounds:
(def sumo (partial reduceo f/+)) (defn summands "Find ways you can add some numbers up to a given total. For each number (summand) provide bounds `[min max]` or `nil` if you don't know. Possible assignments are returned in the same order." [total bounds] (let [lvars (repeatedly (count bounds) l/lvar) boundsgoals (map (fn [v [min max]] (f/in v (f/interval (or min 0) (or max total)))) lvars bounds)] (l/run* [q] (l/== q lvars) (l/and* boundsgoals) (sumo total lvars))))
Finally we need a way to use these weights to partition up a collection:
(defn partitionbyweights [weights coll] (loop [[weight & restweights] weights coll coll acc []] (if (some? weight) (recur restweights (drop weight coll) (conj acc (take weight coll))) acc)))
There's an alternative way to write this using appendo
: (l/appendo a b c)
is
like append
but for lvars. You can create a version that supports multiple
lists (let's call it concato
) using the same machinery:
(def concato (partial reduceo l/appendo))
In a sense concato
is more direct: the lists are the lists you want, there's
no partitionbyweights
to translate from your numeric answer to the list
partitions you need. With all of these goals in place we can implement
concatenation. We find each possible weight distribution, and for each
distribution, each group must match that set of lvars.
(defmethod re>goal :concatenation [{:keys [elements]} lvars] (let [nvars (count lvars) nelems (count elements) bounds [0 nvars]] (l/or* (for [weights (summands nvars (repeat nelems bounds)) :let [lvargroups (partitionbyweights weights lvars)]] (l/and* (map re>goal elements lvargroups))))))
Fixing :character
What the concatenation problem tells us is that character is broken, too. Concatenation may attempt to assign 2 lvars to a single character. Right now, that returns a goal unifying the first lvar with that character: if there were more, the remaining lvars would simply go unconstrained:
(defmethod re>goal :character [{:keys [character]} [lvar]] (l/== character lvar))
We communicate a problem with a failing goal. If some goal fails, the engine
will stop exploring that branch. The alwaysfailing goal l/fail
is designed
for this sort of thing. (You can probably guess what its counterpart l/succeed
does.) The amended version looks like this:
(defmethod re>goal :character [{:keys [character]} [lvar :as lvars]] (if (> lvars count (= 1)) (l/== character lvar) l/fail))
You could also do this in logic programming so that it's checked at runtime, but since we're generating our goals ahead of time here and only giving them to the engine at the end, we can shortcircuit here.
Repetition
Sample repetition parses
Let's see what the parser does for a few simple repetitions:
(cre/parse "A{1}") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :repetition, :elements [{:type :character, :character \A}], :bounds [1N 1N]})})} (cre/parse "A{1,}") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :repetition, :elements [{:type :character, :character \A}], :bounds [1N nil]})})} (cre/parse "A{1,2}") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :repetition, :elements [{:type :character, :character \A}], :bounds [1N 2N]})})} (cre/parse "A*") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :repetition, :elements [{:type :character, :character \A}], :bounds [0 nil]})})} (cre/parse "A+") ;; => {:type :alternation, :elements ({:type :concatenation, :elements ({:type :repetition, :elements [{:type :character, :character \A}], :bounds [1 nil]})})}
This matches our expectation, since:

A{n}
means exactly n A's 
A{m,n}
means between m and n A's, inclusive 
A{n,}
meansn
or more A's 
A*
means 0 or more A's 
A+
means 1 or more A's
"No bound" is represented by nil
. Some numbers are represented as just 1
,
others as 1N
. The latter are clojure.lang.BigInts
, not longs. The parser
does this so it can support regexes with pathologically large sizes (longer than
a long
, a signed 64bit integer). We can ignore that distinction: both behave
identically for our purposes.
Implementing repetition
Repetition has a generalization of the :character
behavior we fixed earlier TKTK multiple vars
(defmethod re>goal :repetition [{[elem] :elements [lower upper] :bounds} lvars] (let [nvars (count lvars) lower (> lower (or 0)) upper (> upper (or nvars) (max nvars))] ;; Even though e.g. the empty string matches "A*", we get the lvars from the ;; structure of the puzzle, so we know all lvars have to be matched. ;; Consequence 1: 0 reps only works if there are 0 vars to match. (if (zero? nvars) (if (zero? lower) l/succeed l/fail) (l/or* (for [reps (range (max lower 1) (inc upper)) ;; Consequence 2: can't have any leftovers: must match all parts :when (zero? (rem nvars reps)) :let [groupsize (quot nvars reps) groups (partition groupsize lvars)]] (l/and* (map (partial re>goal elem) groups)))))))
Trying to solve a few puzzles
We've done quite a bit of work but still haven't solved any real puzzles. We'd like to solve the game's own puzzles, so let's start from its internal representation. Remember the first puzzle from the beginner set:
Its internal representation looks like this:
$ jq '.[1].puzzles[0]' challenges.json { "id": "475e811ada594ce89b803124b33cc041", "name": "Beatles", "patternsX": [ [ "[^SPEAK]+" ], [ "EPIPEF" ] ], "patternsY": [ [ "HELLO+" ], [ "[PLEASE]+" ] ] }
(We'll turn the keys to kebabcasekeywords so it looks a bit more Clojurey.)
There's something strange about the sample data: :patternsx
and :patternsy
are both seqs of seqs of patterns. You may have expected them to be seqs of
patterns directly. This data structure choice isn't obvious until you hit a
later puzzle such as the first level in the "Double Cross" set:
This puzzle is represented internally as:
{:id "f0f06b00ec0a4572935d7459e2a13064" :name "Telekinesis" :patternsx [["[DHJM]" "[^FKMZ]"] ["[^ARUZ]" "[AKSV]"]] :patternsy [["[AGNZ]+" "[^ADIS]+"]]}
As you can see this just represents different patterns affecting the same row. Ironically while this is presumably intended to make the puzzle harder, the extra constraints probably just make the logic engine's life easier. Either way, solving means we make some lvars for each box, carve them up by rows and columns, and then apply the regex goals:
(defn solve ([puzzle] (solve puzzle nil)) ([{:keys [patternsx patternsy] :as puzzle} {:keys [n] :as opts :or {n 10}}] (let [nvars (* (count patternsx) (count patternsy)) vars (repeatedly nvars l/lvar) rows (partition (count patternsx) vars) cols (apply mapv vector rows) patterngoals (map (fn [patterns lvars] (l/and* (map #(> % cre/parse (re>goal lvars)) patterns))) (concat patternsx patternsy) (concat cols rows))] (>> (l/run n [q] (l/== q vars) (l/and* patterngoals))))))
We can use this to solve the first tutorial puzzle:
(solve {:id "272901bb085541579b45272935da8c93" :name "The OR symbol" :patternsx [["AB"]] :patternsy [["AZ"]]}) ;; => (((\A)))
Unfortunately, the second one has character classes, so we have to implement that next.
{:id "6915fafc3323484db8015daac73ddb56" :name "A Range of characters" :patternsx [["[ABC]"]] :patternsy [["[BDF]"]]}
Character classes
Let's look at another parse:
(cre/parse "[A]") ;; => {:type :alternation, :elements [{:type :concatenation, :elements [{:type :class, :elements [{:type :classintersection, :elements [{:type :classunion :elements [{:type :classbase :chars #{\A}}]}]}] :brackets? true}]}]}
Whoa! This parses to something a lot trickier than you might've guessed. Class
sets (intersections, unions and subtraction) are somewhat obscure set of regex
features with slightly differing support across platforms and slightly different
syntax. They're particularly useful when you have a full Unicode regex engine
and you want to say things like "match characters in this script, but not digits
or punctuation". In Java, Ruby and Python, they are spelled something like
[α&&[β]&&[γ]]
where αβγ
are all their own class specs.
This is a sideeffect of using a parser intended for parsing Java regular expressions. JavaScript doesn't support these features. Fortunately, they're pretty easy to implement. As usual, we'll tackle the simple base case first.
(= '(\A) (l/run* [p] (rcl/re>goal {:type :classbase :chars #{\A}} [p])))
(defmethod re>goal :classbase [{:keys [chars]} [lvar :as lvars]] ;; It appears classbase will only ever have one char, but I'm writing this ;; defensively since I have no proof I've exhausted all the parser cases. (if (> lvars count (= 1)) (l/membero lvar (vec chars)) ;; vec, bc membero doesn't work with sets l/fail))
Class union and intersection
While we could cheat implementing class union and intersection since the original JavaScript puzzle would never have them, they're trivial to express in logic, so let's just do them properly:
(t/testing "classunion" (t/is (= '(\A) (l/run* [p] (rcl/re>goal {:type :classunion :elements [{:type :classbase :chars #{\A}}]} [p])))) (t/is (= '(\A \B) (l/run* [p] (rcl/re>goal {:type :classunion :elements [{:type :classbase :chars #{\A}} {:type :classbase :chars #{\B}}]} [p]))))) (t/testing "classintersection" (t/is (= '(\A) (l/run* [p] (rcl/re>goal {:type :classintersection :elements [{:type :classbase :chars #{\A}}]} [p])))) (t/is (= '(\A) (l/run* [p] (rcl/re>goal {:type :classintersection :elements [{:type :classbase :chars #{\A}} {:type :classbase :chars #{\A}}]} [p])))) (t/is (= '() (l/run* [p] (rcl/re>goal {:type :classintersection :elements [{:type :classbase :chars #{\A}} {:type :classbase :chars #{\B}}]} [p])))))
Implemented by:
(defmethod re>goal :classunion [{:keys [elements]} lvars] (l/or* (map #(re>goal % lvars) elements))) (defmethod re>goal :classintersection [{:keys [elements]} lvars] (l/and* (map #(re>goal % lvars) elements)))
Ranges
You're probably catching on. Implementing ranges uses a bunch of logic machinery we already use, and just a Clojure description of what a range is:
(t/is (= '(\A \B \C \D \E \F) (l/run* [q] (> "[AF]" cre/parse (rcl/re>goal [q])))))
Implemented by:
(defmethod re>goal :range [{:keys [elements]} [lvar :as lvars]] (if (> lvars count (= 1)) (let [[lower upper] (map (comp int :character) elements) chars (map char (range lower (inc upper)))] (l/membero lvar chars)) l/fail))
Because ranges like AZ09
are implemented using :classunion
, they just
magically work.
Simple classes
"Simple classes" are things like \d
, \w
, \s
. You know how to implement
these already.
Class negation
Class negation is actually surprisingly tricky! Many logic systems, core.logic
included, focus on positive statements. There is support for disunification
(e.g. this lvar will never be 5) with l/!=
. There's also more general support
for negation as a constraint, as used here:
(defmethod re>goal :classnegation [{:keys [elements]} lvars] (l/and* (for [e elements] (l/nafc re>goal e lvars))))
Unfortunately nafc is marked as experimental and for good reason. For one, it only works if the vars are ground. Docstring claims execution will be delayed if they aren't... but that doesn't appear to be true. See LOGIC172 for details. This problem annoyingly surfaced depending on the order clauses were evaluated in (since that changed if they were ground or not). I was able to get it to work consistently by adding a hack to make sure the vars are ground, and adding a test to make sure it would work even if this was the first clause:
(defmethod re>goal :classnegation [{:keys [elements]} lvars] (let [neggoal (l/and* (for [e elements] (l/nafc re>goal e lvars))) domain (>> (range (int \A) (inc (int \Z))) (map char)) groundhack (l/and* (for [v lvars] (l/membero v domain)))] (l/all groundhack neggoal)))
A different way to implement this would be to walk this part of the parse tree manually. Once you find a negation, you look at the rest of the parse tree and aggregate all the classes it covers. Then, you use plain negation (disunification), which has no caveats. This is what I ended up implementing; if you're interested you can read the details in the implementation.
Backrefs
Backreferences are an interesting challenge because so far we've just assumed we can recurisvely turn the regex parse tree into goals piecemeal, and backrefs break that assumption: they necessarily depend on a totally different part of the parse tree. There are two ways to address that:
 Change the implementation to produce a data structure instead of logic goals directly, and then do something evallike later to turn the data structure into goals
 Cheat and introduce side effects to propagate within the tree walk.
The first is clearly the "good" answer. It would also enable other improvements, like treelevel performance optimization. It's also a good chunk of work, and I had spent enough time on this weekend project already. So, instead, I did the hacky sideeffecty version. This turned out to be tricky to implement not from a logic perspective (that part is easy) but because the underlying parser I had forklifted was from a project that didn't support backrefs, and so it didn't bother to parse them out correctly. You can see how I hacked around that in the implementation.
Conclusion
This introduction was a little unorthodox. Most texts focus on introducing
different primitives, have you write small static rules and build up from there.
You'll see conde
long before anyone talks to you about or*
.
Negationasfailure, which we used for classes, is exotic and nonrelational.
The way we implemented backrefs works but is inelegant. That's all just a
consequence of the problem we solved, not a principled stance on how to use
core.logic
let alone teach it. Reading The Little Schemer is still a
good idea if you want to learn to write your own programs.
(If you want a logic program where the goals are not fixed in advance, you might want to just stick to the JVM so there are no restrictions on eval use. I haven't tried running core.logic on sci.)
Next steps
These are things I thought were neat but didn't make sense in the original text.
Running programs backwards
A truly mindbending feature of relational programs is that they can be ran "backwards". Usually, you write a program "forwards": it has some inputs and you expect some outputs. If you write your logic program a particular way, it doesn't actually know what's an input and what's an output, and so you can give it an "output" and it will come up with "inputs" that would have led to that output. This post did not demonstrate that, but it's one of the better hooks I've found to get people excited about logic programming. I could not give this talk better than Will Byrd presenting miniKanren. I won't spoil the ending, but go watch that talk.
Our program does not work that way. All our "inputs" are rules, all our lvars are the same kind (unknown boxes), so running it "backwards" doesn't really mean anything. It could, if our parser and rule generation were also fully relational. In that case, we could fill out some boxes and our program would tell us progressively more creative regular expressions that would match them. Instead of a puzzle solver, we would have written a puzzle creator.
Palindromes
Some puzzles have hints that help you solve them. Usually they're thematic cues, (e.g. "Hamlet") which are hard to tell a logic engine about. Structural cues are easy. For example, if you think some lines should be palindromes, you can just add the following constraint:
(l/all* (map (fn [vars] (l/== vars (reverse vars))) lines))
If evaluated at the right time, this should constrain the search tree. It's tricky to make sure that happens without measurement and fiddling.
Thematic cues
If you really wanted to add thematic cues, e.g. "Hamlet", you could grab the Wikipedia page, use something like TF/IDF to find important words, filter by the appropriate size, and introduce them as constraints. I don't expect this will speed anything up much even if you discount the initial step.
Hexagon shaped puzzles
I made these work, but didn't hook it up to the solver permanently (mostly because I lost interest). I like hex grids a lot, but they're not interesting from a logic perspective so I didn't cover them in this post. You can check out the implementation but be warned: it's mostly just some tedious bean counting.