# C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2013 Andrew Cooke (site) / post authors (content).

## Notes on State and Reliability

From: "andrew cooke" <andrew@...>

Date: Sat, 13 Jan 2007 18:12:18 -0300 (CLST)

I haven't posted much recently because this last couple of weeks have
been somewhat chaotic as we try to integrate the archive with the
other components of our "end to end" system.  Nothing terrible on our
side, although a problem with hibernate (use Sets not Lists!) resulted
in "can you replace this with JDBC before tomorrow morning?" (we did,
but something else made that particular day's take on the deadline
slip by).

So it's been lots of small details as we connect the Portal - I just
got home from fixing what I hope is the final error (itself a result
of a too-hasty patch that left our object model inconsistent).  As far
as I know, as I type this, everything works.  Yay! (and, as an extra
bonus, on my way out the door I saw Slashdot had sent me Naftalin and
Wadler's Java Generics book to review (later: finished that review
before these notes)).

Meanwhile, in the gaps between the fixes, I've been piecing together a
detailed description of our system, with the aim of reconfiguring
messaging to give a reliable system.  And that has finally forced us
to address an issue that we've skated around various times in the last
year or so: handling state during synchronous messaging.

Background

So, to recap a little, we've been experimenting with a lightweight
approach to SOA.  Services are POJOs; Spring for a container; Mule for
ESB.  It's worked nicely: we have a system that is very flexible, can
be tested easily, and which has the business logic completely separate
from any supporting technology (there are exceptions, but that's the
work in the North, which I'll let someone else justify).

In fact, it's worked so well that we forgot we were writing services.
Which perhaps wasn't that smart a move.

When you work with Java instances it's very natural to call a method
on another instance and then use the result.  That's exactly what our
code looks like, even though those instances are services, and calling
that method may translate into a message exchange with a remote
machine.

To understand what happens, let's take the process step by step.  And
to make the example clearer, let's assume it's the Data Entry Service
(DES) which is managing the ingest of data into the system.  In
particular, it's calling the Scientific Metadata Loader (SML) to store
details about the images in the database.

1 - The DES calls the local SML instance.

2 - The SML instance is not the real service, but an adapter that
generates a POJO message, stores the method parameters in the message,
and passes it to Mule.

3 - Mule serializes the message, puts it in an envelope, adds some
messaging metadata to the envelope, and hands it to the transport.

4 - The transport delivers the envelope to a remote Mule instance.

5 - The remote Mule instance opens the envelope, deserializes the
message, and calls a method on the local target, passing the message
as a parameter.

6 - The local target is another adapter, which unpacks the original
method parameters from the message and calls the real SLM
implementation.

7 - The SLM does its thing and then returns a result (or throws an
exception).

8 - The target adapter packages the result (or exception) in a
response message and hands it back to the (waiting) Mule.

9 - Mule serialises the response, puts it in an envelope whose
(return) address is taken from the original envelope's messaging

10 - The transport carries the envelope back to the Mule instance at
the DES.

11 - Mule at the DES returns the envelope contents to the adapter
within the DES instance.

12 - The adapter returns the result from the response message (or
raises the exception).

13 - The DES gets a return value.

I've simplified things slightly, but that's the general idea.  The
adapters can be generated dynamically and are part of our messaging
infrastructure, not Mule - for more information see
http://www.acooke.org/esoa.pdf

So it's a bit complicated, but that's what infrastructure is for.  Why
should we care?  We care because we want a reliable system.  But to
understand that we first need to be clear about what "reliable" means.

Reliability

In broad terms, a reliable messaging service "gets the message there".
That's the guarantee - that one day, come rain or shine, that message
will be delivered.  There are more details, of course - things like
timeouts, priorities, broadcast, discovery - but the baseline
functionality is that the message is delivered, eventually.

(Even this simple guarantee takes a fair amount of effort, since it
must happen no matter how unreliable the two endpoints and the
underlying network)

That's reliable messaging, but what is a reliable system?  A reliable
system must continue to work even if the individual services fail.
Obviously there are degrees of reliability - how many alternative
services do you need - but the baseline is that even if you only have
"one of each" service, the system as a whole should continue to work
if services are only present intermittently.

So a reliable system extends the idea of reliable messaging - it will
happen eventually - to include the services themselves.

State

For a service to be reliable it must persist its state.  A simple
naming service that provides a guaranteed unique name, for example,
must be able to continue meeting that guarantee across restarts.

Care is needed when coupling messaging and services together.  To
understand why, consider the following scenario:

1 - A message is delivered to a service's queue.

2 - The service processes the message and persists its state.

3 - The message is deleted from the queue.

If the service fails between 2 and 3 then, on restart, it will process
the message again.  There are two solutions to this problem: either
use transactional processing or make the service idempotent.

Transactions can guarantee that message deletion and service
completion occur together as single action.  They typically require
support from the container, but can be added to a single service
without altering the entire system.

Idempotent services are those that will handle repeated messages
correctly.  They do not necessarily require transactions, but may
impose other restrictions (limited parallel throughput, or extended
information - for example, a serial number - in messages) that can
impact the rest of the system.  One advantage of idempotent services
is that they allow easier "replay" of failed transactions, which can
be useful with end-to-end verification (see below).

Orchestrators

Particular care is needed with orchestrators - services that call
other services in a sequential manner.  The natural way to program an
orchestrator (or composite service), particularly in a lightweight
framework like ours, is as a series of synchronous calls to other
services.  This is implemented in Mule via the process described
earlier.

However, there is a problem with this approach: it implies that state
persists within the service during sequential calls.  Our discussion
of state, above, was for a service whose state exists only during a
local transaction.  If we use a transaction for a composite service
then it includes the calls to sub-services.

Extended transactions may be possible, with the help of a suitable
framework, but require collaboration across the entire system.  It is
therefore worth considering other solutions.

The most obvious candidate for simplifying the management of state in
orchestrators is probably to restrict transactions to the processing
between message calls.  So a transaction starts on receipt of a reply
and continues until a message is dispatched to the next service in the
chain.  In this simple form, however, there are two problems.

First, with no global transaction, the system as a whole cannot revoke
a process that fails after it has been processed by several services.
It may be possible to design the system so that this is not a problem
- perhaps intermediate actions do no harm, or can be explicitly
revoked, or corrected by using remediation and idempotent services (ie
replaying the process once the problem is fixed).

Second, more seriously, we now have two conflicting approaches for
identifying state with a message.  On the one hand we are starting a
new transaction for each incoming reply; on the other we are still
sending replies to a particular instance of our orchestrator (step 11
of the long worked example at the top of the discussion).

To understand how this is a problem, consider what happens if the
orchestrator goes down in the middle of a conversation.  When the
sub-service finishes processing it returns a value, but Mule will be
unable to deliver that value to the correct instance, even if the
orchestrator re-starts.  This is because we have only done "half the
job" when we introduced transactions between messages.  To make each
orchestration step equivalent to a simple service we need to do more:
we need to stop relying on a particular instance to hold the state
and, instead, retrieve it when we accept the incoming response.

In other words, we need to break down the orchestrator into a series
of independent "mini services".  Synchronous messaging that replies to
a single instance is replaced by an asynchronous response to the next
"step".

By now it should be clear that we end up re-inventing the workflow
based engines (state machines) that are programmed in BPEL, etc.

Returning to lightweight SOA with POJOs, there's one conclusion to
draw from this - reliable synchronous between POJO instances don't
give a reliable system.  We cannot use the instance to find the
orchestrator's state.  Instead, we must configure the system to use a
singleton orchestrator, for that singleton to be used for replies even
if restarted, and use some key (probably stored in the message
metadata) to retrieve the appropriate state.  Alternatively, we could
use a third party orchestrator.

End-to-End Verification

In some cases a reliable system (in the sense described above) may not
be necessary.  If the process has a well-defined success, and all
services are idempotent, then an unreliable system can give guaranteed
results by monitoring the end-to-end performance and repeating
messages as necessary.

This is rather brutal, but effective.  Reliable messaging may still be
used - typically to reduce failure rates over unreliable networks -
but there is not attempt to make the system piecewise reliable.  Apart
from idempotent services the system must also be moderately reliable
and have spare capacity (I naively expect capacity to be inversely
proportional to reliability - if 50% of messages fail you need to send
them twice).

Summary

Persistent workflows, transactions, and reliable messaging are
complementary.  Their relationship can be understood by studying the
management of state within a system.

Within a piecewise reliable system, synchronous reliable messaging
between service instances makes little sense.  It may be a useful
tool, however, if only end-to-end reliability is required and all
services are idempotent (handle repeated messages gracefully).

I've written all this down because (1) it saves me having to rethink
it each time I want to understand what tradeoffs are possible; (2) if
I've made a mistake hopefully it will become clear; and (3) I have a
hell of a time finding this anywhere else - which is OK if you just
follow the standard conventions, but frustrating if you want to
understand why you are doing so.

Credit

Thanks to coworkers at NOAO for helpful discussions on this,
particularly Alvaro and Evan.

### State and Reliability II

From: "andrew cooke" <andrew@...>

Date: Sun, 14 Jan 2007 09:17:02 -0300 (CLST)

Two things I forgot to mention earlier:

- Choosing where you deploy services can make a big difference.  With the
lightweight POJO approach it is easy to deploy a cluster of services in
the same JVM.  This removes any concern about intermediate messaging, but
may introduce issues with composing transactions.

- Synchronous reliable messaging with naive POJO composite services
(orchestrators) will require remediation whenever a service with pending
messages goes down (since the response has lost the target instance).

Andrew

### State and Reliability III

From: "andrew cooke" <andrew@...>

Date: Fri, 19 Jan 2007 14:52:19 -0300 (CLST)

- A possible solution is described at
http://www.acooke.org/cute/HowToMakeP0.html

- An additional problem is static verification of synchronous/asynchronous
settings.  I was just revising (ie still trying to understand!) a config
and found that some of these settings were incorrect.  This would cause
runtime errors.  Is there any way to "bind" this earlier?

Andrew