| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next


Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

Last 100 entries

British Words; Chinese Govt Intercepts External Web To DDOS github; Numbering Permutations; Teenage Engineering - Low Price Synths; GCHQ Can Do Whatever It Wants; Dublinesque; A Cryptographic SAT Solver; Security Challenges; Word Lists for Crosswords; 3D Printing and Speaker Design; Searchable Snowden Archive; XCode Backdoored; Derived Apps Have Malware (CIA); Rowhammer - Hacking Software Via Hardware (DRAM) Bugs; Immutable SQL Database (Kinda); Tor GPS Tracker; That PyCon Dongle Mess...; ASCII Fluid Dynamics; Brandalism; Table of Shifter, Cassette and Derailleur Compatability; Lenovo Demonstrates How Bad HTTPS Is; Telegraph Owned by HSBC; Smaptop - Sunrise (Music); Equation Group (NSA); UK Torture in NI; And - A Natural Extension To Regexps; This Is The Future Of Religion; The Shazam (Music Matching) Algorithm; Tributes To Lesbian Community From AIDS Survivors; Nice Rust Summary; List of Good Fiction Books; Constructing JSON From Postgres (Part 2); Constructing JSON From Postgres (Part 1); Postgres in Docker; Why Poor Places Are More Diverse; Smart Writing on Graceland; Satire in France; Free Speech in France; MTB Cornering - Where Should We Point Our Thrusters?; Secure Secure Shell; Java Generics over Primitives; 2014 (Charlie Brooker); How I am 7; Neural Nets Applied to Go; Programming, Business, Social Contracts; Distributed Systems for Fun and Profit; XML and Scheme; Internet Radio Stations (Curated List); Solid Data About Placebos; Half of Americans Think Climate Change Is a Sign of the Apocalypse; Saturday Surf Sessions With Juvenile Delinquents; Ssh, tty, stdout and stderr; Feathers falling in a vacuum; Santiago 30m Bike Route; Mapa de Ciclovias en Santiago; How Unreliable is UDP?; SE Santiago 20m Bike Route; Cameron's Rap; Configuring libxml with Eclipse; Reducing Combinatorial Complexity With Occam - AI; Sentidos Comunes (Chilean Online Magazine); Hilary Mantel: The Assassination of Margaret Thatcher - August 6th 1983; NSA Interceptng Gmail During Delivery; General IIR Filters; What's happening with Scala?; Interesting (But Largely Illegible) Typeface; Retiring Essentialism; Poorest in UK, Poorest in N Europe; I Want To Be A Redneck!; Reverse Racism; The Lost Art Of Nomography; IBM Data Center (Photo); Interesting Account Of Gamma Hack; The Most Interesting Audiophile In The World; How did the first world war actually end?; Ky - Restaurant Santiago; The Black Dork Lives!; The UN Requires Unaninmous Decisions; LPIR - Steganography in Practice; How I Am 6; Clear Explanation of Verizon / Level 3 / Netflix; Teenage Girls; Formalising NSA Attacks; Switching Brakes (Tektro Hydraulic); Naim NAP 100 (Power Amp); AKG 550 First Impressions; Facebook manipulates emotions (no really); Map Reduce "No Longer Used" At Google; Removing RAID metadata; New Bike (Good Bike Shop, Santiago Chile); Removing APE Tags in Linux; Compiling Python 3.0 With GCC 4.8; Maven is Amazing; Generating Docs from a GitHub Wiki; Modular Shelves; Bash Best Practices; Good Emergency Gasfiter (Santiago, Chile); Readings in Recent Architecture; Roger Casement; Integrated Information Theory (Or Not); Possibly undefined macro AC_ENABLE_SHARED; Update on Charges

© 2006-2013 Andrew Cooke (site) / post authors (content).

Offside Parsing Works in LEPL

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:14:54 -0400

I just got a complete test working for offside (whitespace/indentation
sensitive) parsing working in LEPL (my Python parser -

What follows is a re-formatted version of a test from this file -

Here's the grammar (note that I have hardly any structure - there's no
clear definition of statements or commands or variables, it's just
enough to use the indentation-aware code):

# these are the basic tokens that the lexer
# recognises - whitespace is then handled
# automatically
word = Token(Word(Lower()))
continuation = Token(r'\\')
symbol = Token(Any('()'))

# the ~ here means these are used to match
# but discarded from the results
introduce = ~Token(':')
comma = ~Token(',')

# first we need to define how a single
# logical line can continue over many
# lines in the text
CLine = CLineFactory(continuation)

# if we don't want lines to continue,
# we could just use the BLine() matcher

# next a minimal language definition that
# says statements are sequence of words
statement = word[1:]

# argument lists can extend over multiple
# lines (the parser will "know" their extent
# because they are inside (...))
args = Extend(word[:, comma]) > tuple

# and a function header is some words followed
# by the argument list
function = \
  word[1:] & ~symbol('(') & args & ~symbol(')')

# now we get to the interesting part.  we
# introduce blocks, which are indented
# relative to the surrounding text
block = Delayed()

# and lines which are what are inside blocks.
# note that a block is a valid line
# because we can nest blocks, and an empty
# line can appear too.  finally we collect
# the output in a Python list so we can
# see the structrue in the result
line = Or(CLine(statement),
          Line(Empty()))        > list

# now we can define the block: it comes
# after a function header or statement
# (both those end in introduce - ":") and
# contains lines.
block += \
  CLine((function | statement) & introduce) \
  & Block(line[1:])

# and a program is a list of lines.
program = (line[:] & Eos())

# the usual LEPL way to make a parser,
# with a new configuration type. the
# policy argument is the number of spaces
# needed in an indent for a single block.
return program.string_parser(

And here's the text that we will parse:

this is a grammar with a similar
line structure to python

if something:
  then we indent
  something else

def function(a, b, c):
  we can nest blocks:
    like this
  and we can also \
    have explicit continuations \
    with \
any \

same for (argument,
  which do not need the
  continuation marker

Running the parser against that text gives the following, where the
nested lists indicate that we have matcher the block structure

[ [],
  ['this', 'is', 'a', 'grammar', 'with', 'a', 'similar'],
  ['line', 'structure', 'to', 'python'],
  ['if', 'something',
    ['then', 'we', 'indent']],
    ['something', 'else'],
  ['def', 'function', ('a', 'b', 'c'),
    ['we', 'can', 'nest', 'blocks',
      ['like', 'this']],
    ['and', 'we', 'can', 'also', 'have', 'explicit',
     'continuations', 'with', 'any', 'indentation'],
  ['same', 'for', ('argument', 'lists'),
    ['which', 'do', 'not', 'need', 'the'],
    ['continuation', 'marker']]]

I hope to release a beta containing this in the next few days, and
will then start working on documentation.  When the docs are done I
will release a new version.

If you want to try this now, you can get the code from the hg repo -


What's so Neat...

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:44:10 -0400

...about this is that - despite some need to rewrite things - it all
fits into the existing LEPL architecture.  This is a "big deal"
because whitespace parsing mixes information between different levels
of the parser.  The presence of "(...)" or a continuation marker like
"\" influences what the whitespace "means", so while we can detect
indentation in the lexer, we cannot interpret it until the parser
itself is running.  But at the same time, we want to avoid the need to
explicitly add tokens for continuation markers and indentations
"inside" the definitions for statements, expressions etc - the line
structure should be as isolated as possible (imagine having to write a
grammar where between each word you need to include the possibility
that the continuation character appears at that particular point).

Another problem was the "global" state required to handle the current
indentation.  It turns out that LEPL's concept of monitors was a
perfect match for this.

Related to the above was the issue of how to provide a clean,
declarative syntax.  To do this I built on the ideas already
implemented for tokens, and extended streams with filters.  It took a
few iterations, but I am really happy with the final result.

And using LEPL's generic configuration and graph rewriting means that
these new extensions can be integrated with the existing code without
breaking other modules....

I'm *so* pleased this has worked :o)


More Offside Documentation

From: andrew cooke <andrew@...>

Date: Wed, 16 Sep 2009 22:20:00 -0400

There's now an initial draft of a new chapter at


Delayed due to State

From: andrew cooke <andrew@...>

Date: Sat, 19 Sep 2009 09:25:55 -0400

Offside support has been delayed slightly because it breaks when used
with memoisation.  This is because (I think) the current indentation
level is not taken into account by memoizers.

Consider the end of a block.  At the end of the block another line is
attempted.  This fails because the indentation is incorrect.  So the
block ends, decrementing the indentation level, and the line is tried
again outside the block.  However, *exactly* the same stream is used
for the line matcher in both cases.  So the second time the memoizer
for the line says "nope, we already know this failed".  When it should
have succeeded, because the indentation level is now correct.

The only clean solution I can see is to introduce the concept of
global (ie per thread) state (a dictionary) in which values (like
current indentation can be stored).  Memoizers then combine the hash
of that state with the hash of the stream to detect repetition.

But will that be sufficient?  What about when two such cases above are
nested?  Will the "inner" case be expanded?  I think so, but am not
100% sure.


Comment on this post