# C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2015 Andrew Cooke (site) / post authors (content).

## Notes From a Cache Rewrite

From: andrew cooke <andrew@...>

Date: Fri, 12 Oct 2012 05:42:00 -0300

Update: A higher-level, easier to read account is now available.

You can jump directly to Implementation or Lessons Learned if you don't want the background details.

Parts below are critical of the original design; since I had a chance to comment on the original design, before it was implemented, this is largely self-criticism.

# The Original Cache

## Background

The cache provides a local Python interface to a remote service (called the "source" below) that speaks XMLRPC.

For confidentiality reasons I don't want to use the original domain model here, but it was analogous to the following: a set of workers that are continuously measured for performance against various metrics; each metric has ratings; each worker has a history of what periods they had which ratings; the rules used to rate workers may vary from individual to individual; several other worker attributes are stored.

So it's a fairly complex, hierarchical set of data that includes time series.

The source provides various views into these data. It has evolved over time and duplicates information in various places. A typical call might return ratings over some time range, for a given worker, or all workers in a certain class. All data are structured in terms of lists and maps (XMLRPC structs).

The original cache reified the model described above as Python classes. The structure of these classes is pretty much as expected (a Worker contains Rules and Metrics, Metrics contain time series of Ratings, etc etc).

## Problem 1 - Schizophrenic API

There are two different "visions" of the cache API in the description above. One is a service that provides "fragments" of data (presumably tailored to particular, historical needs). The other is a graph of related objects.

Either vision could be used to provide an API, but the initial code tried to provide both. So, for example, you could request a fragment of a time series (and receive the appropriate objects). Or you could request a Worker instance (say) and then use methods on that to locate the related data (ie navigate the graph of connected objects).

Unfortunately, to provide all this in a consistent way, without duplicating or omitting data, introduces complexity:

• To guarantee a consistent graph, the cache would need to maintain references to generated objects so that it could "patch them in" to new, related data, as they are requested.

• To support retrieval of related data, some backlinks were needed within the graph of data model instances.

And in practice, this was incompletely implemented (fragments were disconnected). The result was inconsistencies and duplications in the retrieved data.

## Problem 2 - Multiple Data Models

The "object-graph" part of the API was new to the cache. Code in the source (and other, related services) structured the data in a more tabular form - as nested lists and maps. The XMLRPC calls returned sections of this tabular structure.

This led to a mindset which considered the tabular form of the data as fundamental. So the original cache used this as the internal data format.

Since data within the cache were stored in tabular form, rather than as instances of the model provided in the API, the cache was not able to "patch together" a consistent graph (see previous section).

It was also unclear exactly what data had already been retrieved, since the source provided a "patchwork" view over the table and the format had no natural way to indicate where data were missing.

And, finally, the tabular format was undocumented and open to change. The only reliable reference was the implementation used in the source, but although that was available as a separate module, it was re-implemented for the cache.

## Problem 3 - Granularity And Efficiency

The main cost (in time spent) when using the source is constant per-connection. So it makes much more sense to request a large amount of data in a single call, rather than across multiple calls.

Clients that used the cache, therefore, would "pre-load" the cache by making a very general call (eg current health for all Workers) before requesting more specific data.

Because of the limitations of the internal tabular format (see above) the cache was unable to detect what data had been pre-loaded. So the API was extended to provide explicit control for when to use the internal table and when to contact the source.

This, in turn, led to more complex clients.

## Problem 4 - No Generations

Even though the source updated in definite steps the protocol did not include version data.

This, together with the fragmented view of data provided by the source, led to overlapping, inconsistent data being retrieved over time.

Also, the cache was implemented / used as a single instance, so any "flush" operation would apply to all cache users. This effectively forced single-threaded use.

## Problem 5 - Not Distributed

Because the cache was in-memory (in Python code) it could not be shared across machines. Clients on multiple machines produced repeated, duplicate requests to the source.

# The New Cache

## Background

Work for the new cache was motivated by problem 4 (no generations) above. Multiple clients were over-taxing the server; a new implementation that used memcache to support distributed clients was requested.

The new cache implementation had to be generally compatible with the initial version, but client code was available and could be modified to accommodate small changes, if necessary.

Not all clients of the source use the cache; some call the protocol "directly" (via Python's XMLRPC libraries). The code for these clients could not be modified, so modifications to existing XMLRPC procedures were not acceptable.

## Solution 5 - Distributed Cache

As described above, memcache was used to cache the data received from calls to the source. The key is, essentially, the URL; the value is the returned data.

## Solution 4 - Generationed Data

The XMLRPC protocol was extended to include information on the current version of data in the server.

The source already contained two separate generations of data (one current, one being constructed). It was relatively easy to extend this to a configurable number of generations, providing access to each via a new, different port (while preserving access to the "current" generation through the original port). This allows cache instances to access a particular generation without changing the existing protocol.

Memcache keys include the source address (with port, which is tied to generation number). This allows multiple generations to be cached without overwriting data.

To simplify management of generations in clients, the new cache implementation has a method that returns a fresh instance for the latest generation. This, combined with other changes (below) that remove the need for explicit cache control, supports a much simpler pattern of use: a cache is instantiated for a single, short task (eg generating a web page) that requires consistent data from a single generation. Re-running the task, or similar tasks in other threads, use a new instance of the cache.

Finally, a new service was introduced to the system, called the Primer. This monitors the source, detects a new generation, and pre-loads commonly-used, generic data (see below) to memcache. It then modifies an entry in memcache that describes the current generation; this allows new cache instances to identify and access the latest data.

## Solution 3 - Expansion To Generic Requests

Although the original cache interface was very broad, the problems outlined earlier led to it being used in a very restricted way - global information was loaded (ie health for all Workers) and then, for each Worker that required further data, all data was loaded for that instance.

So, in practice, only two source calls were used to populate the cache.

The new cache intercepts all calls and generates responses from the internal model (described below). Because the internal model contains sufficient information to identify incomplete data the appropriate calls to the source can be made automatically. Only the two calls described above are used, so the implementation is quite simple.

For examples: an initial query to retrieve a single Worker will trigger the retrieval of all Workers (the "global information" described above); the first request for detailed data for a given Worker will load all data for that instance.

## Solution 2 - Single Data Model

The data stored in memcache are direct snapshots of the source calls. The new cache uses that data to construct an in-memory graph of instances using the same classes exposed by the cache API.

As described above, the graph is constructed in two stages. First, "global data" are loaded; second, calls to retrieve data for particular Workers trigger retrieval of the data for those instances (only). It may seem costly to keep this in memory, but since they are also the results returned to clients they are likely held in memory by the caller. And cache lifetimes tend to be short (discarded for each new generation), so the in-memory cache does not have time to grow too large.

There is no attempt to reconstruct the "tabular" format used in the source (cf the old client, as described above).

## Solution 1 - Emphasise One Vision Of API

The new client prioritises the "graph of objects" API. Calls to request fragments are still supported, but trigger the creation of the appropriate section of the object graph before returning a filtered subset of that model.

This guarantees (with generationed data) a consistent set of results.

# Implementation

The cache has four responsibilities:

1. Expansion of specific calls (targeting a particular detail) to generic requests (targetting all workers, or all information for one worker).

2. Caching responses from the source in a way that can be accessed by a distributed set of clients.

3. Translation of (tabular) responses into data model instances.

4. Management of generations.

It is implemented using a layered approach. The lowest level is a "dumb" wrapper around memcache. For each method in the cache API (which corresponds pretty closely to each XMLRPC procedure) memcache is checked; if no data are available a call is made to the source and the results stored in memcache and returned to the caller.

The data model is a separate module that includes translation from the tabular format (as received from XMLRPC) into the graph of objects. The lowest level cache translates each fragment separately, so the client receives unconnected fragments on successive calls.

The middle level of the cache manages an in-memory set of Worker instances. The first call for a Worker is converted to a call (via the lower level) for all workers; the results (as data model instances) are stored by name. Subsequent calls then pull Workers directly from this in-memory store.

The top level of the cache is similar to the middle level, but intercepts calls for detailed data. The parent Worker is retrieved from the mid-level and, if that is missing the required information, a request is made for all data for that Worker; the response is used to construct the entire graph. Finally, the client receives the appropriate fragment taken from the complete, consistent object graph. In this way the client receives connected fragments.

This is implemented using inheritance: the lowest level is the base class; the middle and top levels are successive sub-classes.

The base level must be generation-aware as the memcache keys depend on the generation used. This level also implements the method that returns a new cache instance for the latest generation. By using type(self)(....) (and the same constructor arguments for all classes) sub-classes are automatically accommodated. The previous cache implementation was modified return itself when this call is invoked. This allows it to be used as a direct replacement when generations are not required.

Preserving the cache API and implementing functionality in progressive layers made it easy to test for consistency during development.

# Lessons Learned

In no particular order:

• Identify common usage patterns and use those to simplify how data are loaded. It's OK to load too much data, within reason (and in a sense it gets more OK as the number of clients increases, since they are all doing the same, and so hitting memcache, and not the source).

• Make generations (if you need them) an end-to-end property. In other words, think about this as early as possible in system design.

• Use a new instance of the cache for each generation (if this is returned from a method on the previous cache then a non-generational cache can return itself).

• Carefully consider the balance between distributed and in-memory stores. We keep a snapshot of the returned data in memcache, but save the "processed" object graph in memory - this reduces server load while keeping the caches responsive to simple queries.

• Try to develop the cache and source in tandem, so all clients can be isolated from the protocol details.

• Make memcache a direct copy of the wire data. This makes it easy to "read through" and supports alternative cache implementations (eg with a different data model implementation).

• Take care to separate the "direct cache" from any "translation layer" that converts the results into a more usable data model.

• Make sure that you can identify "missing data" in your data model.

Following these gave a replacement cache that is significantly simpler than the original implementation, while leaving the cache API, service protocol, and cache clients largely unchanged. It also performs better with multiple, distributed clients and provides stronger guarantees about the consistency of the results (both in time - via generationed data - and "space" - as a single connected graph of data model instances).