From:
"andrew cooke" <andrew@...>
Date:
Mon, 4 Jun 2007 06:35:15 -0400 (CLT)
I have implemented various versions of a "Parallel Sudoku Solver" in
Erlang. These notes try to capture some of the lessons I have
learned.
They are written with an eye on Jonathan Sillito's recent work adding
similar (message sending) functionality to Ruby - I think there are
some interesting issues about "transplanting" ideas from one language
to another (one that includes mutable state but excludes efficient
tail calls) and, at a more practical level, implementing state
machines in an OO language (something I have worried about,
intermittently, for years).
Implementation Details
A quick reading of the latest code at
http://www.acooke.org/cute/ReducedRan0.html shows that it is an
almost-declarative description of a state machine. Each state
(starting, searching, asserting, swapping, sleeping) is embodied as a
function.
Messages are received in a particular state. The logic in that state
(compactly described with pattern matching) selects a transition.
Transitions are helper functions that may send a response message
before calling a new state.
Tail calls are used throughout. Since Erlang "optimises" such calls
no stack is consumed in the transition and the implementation does not
have to be concerned with "returning" (an alien concept for state
machines).
Some messages are common to more than one transition. These are
handled by a single function which takes the following state as an
argument (first class functions simulating continuation passing).
Traditional Functional Strengths
The code illustrates how Erlang plays to functional programming's
traditional strengths:
- Pattern matching (extended in Erlang with conditions using a
restricted range of predicates) allows program logic to be expressed
in a clear, concise manner.
- Explicit state and tail call optimisation simplify transitions
between states.
- First class functions allow easy generalisation of transitions.
Together these make it easy to directly embed a state machine in the
source code.
Erlang also provides, in the OTP platform, "behaviour" support for
state machines. I have not yet investigated this.
No Magic Bullet
Erlang's message passing approach does not "magically" fix the
problems inherent in concurrent software. During development I
stumbled across deadlocks and inconsistent state more than once. What
it does, however, is allow the code to be structured so that the
assumptions, state and logic are easy to read. This helps ad-hoc
reasoning while debugging and allows issues to be resolved relatively
quickly (this is a relatively subjective view and I have no more
objective evidence).
Lessons Learned
The following approaches (patterns?) were helpful:
- Match for unexpected messages (and fail with a message when you
receive them). This is analogous to the default case in case
statements.
- Use a fixed format for mutable state (ie the arguments to the
functions that represent states in the FSM). This allows general
transitions which take the "next" state/function as an argument.
- Use tagged tuples for messages (ie first tuple value is an atom that
"names" the message). This is a recognised custom in Erlang and
gives ad-hoc typing.
- Use {name, {payload...}} pairs for messages (ie the payload is a
separate tuple). This allows matching on the name (eg {type,
_Payload}) without hard-coding the payload structure in each
pattern.
- Fail deeply. When failure occurs, fall back to as basic a state as
possible. Avoid states that loop on failure, immediately repeating
the action, since this encourages deadlock (an example directly from
the code: in the swapping state, failure to swap was initially
handled by a second attempt at swapping. This eventually lead to
deadlock with all cells requesting swaps from other cells.)
- Avoid invariants. This is a hard one, because invariants are so
useful, but maintaining an invariant across several threads requires
detailed negotiation that cannot always "fail deeply" (in the
"community invariant" described below a cell that attempts to swap
its value with another cell cannot process further requests that
modify its state - assertions or swaps - until it has a reply from
the swap partner).
- Use explicit, restricted protocols. "Fail deeply" and "avoid
invariants" drive the system towards a "permissive" protocol. This
should be resisted as much as possible, since it becomes difficult
to detect errors (ie "Match for unexpected messages" becomes less
useful).
Ruby Library
My main concern for the Ruby library is that a lack of efficient tail
calls means that complex state machines cannot be easily separated
into distinct functions (one per state).
There seem to be two solutions: either use trampolining or represent
state as a variable rather than a function. It would be interesting
to explore these two avenues - it may be that they suggest an
alternative syntax better suited to Ruby.
States as Objects
The natural approach to state machines in an OO language without
support for unlimited tail calls seems to be:
- Each state is an object
- Each state implements a standard interface ("run" or similar) that
returns the *next* state
- Have a outer object (the state "machine") that manages the process
of calling the standard interface on the successive states.
- As I noted 8 years ago - http://www.acooke.org/andrew/smachine.html
- machines can themselves be states.
Reflecting a little on the comments above it becomes clear that this
combines both approaches described in the previous section - the state
is a variable whose value is the current state and the outer machine
object is the "trampoline".
So I guess I have answered my own question - it should be possible to
adapt the code to use an approach like this. The remaining issues (eg
does each state create the "next" state on every call, or is there a
pool?) seem to be mainly book-keeping.
Translating to Ruby
If the above is correct then most comments here also apply to code in
Ruby. The main changes are:
- Tail calls are replaced by the trampolining approach described
above.
- "Generic" transitions would be methods in a shared superclass rather
than functions that take a "next state" function.
Algorithm Discussion
The algorithm used is, obviously, a poor choice for solving Sudoku
puzzles. Peter Norvig has already given a textbook-perfect solution
at http://norvig.com/sudoku.html
However, the parallel approach it is still useful as an example for
learning Erlang/message passing:
- It addresses a fashionable issue, which is motivating (received
attention from Reddit, for example).
- The initial approach is manageable for a newbie.
- Progressively more complex solutions to be introduced (eg
maintaining the "community invariant" so that one of each digit is
used in each "block of 9").
- Finally, it is also possible that in some situations (although, in
my experience, less often than one believes when faced with an
apparently difficult problem) "guided random" exploration of the
solution space is the only tool left.
Motivated by the last point, and because I find the issues
interesting, I will use the remaining sections of this note to comment
on the various strategies used (the history is somewhat simplified to
present the main points - in practice there were many more dead-ends
and mistakes).
Algorithm: Random Exploration
The original implementation modelled each cell as an independent
entity aware only of those cells with which it must not (in the final
solution) conflict. At random it would send a message to one of these
"competitors" asserting a value. If the two cells conflicted, one of
the cells had to change to a new (random) value.
While this is pretty much brain-dead as a practical optimisation
scheme it has one critical property that we rely on and must preserve:
its fixed point is the solution required. In other words, if it ever
reaches a consistent solution it will not change values further
(because there are no conflicting cells). This allows us to detect a
successful solution by monitoring the (lack of) activity in the
system.
Algorithm: Community Invariant
The solution space explored initially is huge (9^81 = 2e77 candidate
solutions). This can be drastically reduced by guaranteeing that each
digit from 1 to 9 is used just once in each "block" on the sudoku
grid. This reduces the number of candidate solutions to 9!^9 = 1e50
(an improvement factor of 2e27).
Implementing this invariant took some care - see the "Lessons Learned"
for more details.
Algorithm: Reduced Range
Cells maintain a list of "legal values" which is modified when they
interact with a "given" value. In this way Cells "learn" to avoid
values that are guaranteed to conflict. This is "first order"
constraint propagation. I have not measured the effect of this change
but we can make a rough estimate: if each digit can conflict with two
given values, on average, then any particular arrangement of a block
has a probability of (7/9)^9 = 0.1 of being correct (wildly assuming
independence). That reduces the number of candidate solutions by a
further nine (number of blocks) orders of magnitude.
Algorithm: Weighting
Each cell maintains a "weight" that reflects the "confidence" in the
current value. New values have zero weight, and weight increases when
an assertion from a neighbouring does *not* conflict.
Weighting can be used to guide behaviour when two cells interact,
comparing and exchanging values. When comparing values the cell with
lower weight is the one that changes; swapping is only successful with
a lower weighted value in the same block.
In this way weighting increases the persistence of more likely values
and so "guides" the system to a self-consistent solution.
Algorithm: Greediness
The simple weighting described above is too strong. It encourages the
growth of self-consistent "structures" of cells, but does not allow
these structures to be discarded when they block the correct solution.
The "greediness factor" introduces some noise into the weighting
decisions. A greediness of 0.9 means, roughly, that weighting is
followed only 90% of the time. In this way even strongly weighted
cells can be forced to change value.
Watching the program run (if configured to print a snapshot every
second or so) it is easy to see how varying the greediness parameter
affects the solution search. A value of 0.9 gives a system that
oscillates between fairly long periods of detailed refinement, when
only a few "uncertain" values are changed, and shorter intermittent
sessions of more violent change, when nearly all values value. With a
value of 0.5 the detailed searches are shorter; "wide-scale" changes
are more common.
While useful, an adjustable parameter requires tuning, which is
difficult when the test case takes several hours to run. Also, there
is no guarantee that a single "good" value of greediness exists for
all puzzles.
Andrew
Parallel Sudoku solver in Stage
From:
"andrew cooke" <andrew@...>
Date:
Tue, 5 Jun 2007 07:01:39 -0400 (CLT)
Jonathan Sillito's work, which I mentioned above, is now online at
http://sillito.ca/stage/sudoku.html
Andrew