| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

Crypto AG DID work with NSA / GCHQ; UNUMS (Universal Number Format); MOOCs (Massive Open Online Courses); Interesting Looking Game; Euler's Theorem for Polynomials; Weeks 3-6; Reddit Comment; Differential Cryptanalysis For Dummies; Japanese Graphic Design; Books To Be Re-Read; And Today I Learned Bugs Need Clear Examples; Factoring a 67 bit prime in your head; Islamic Geometric Art; Useful Julia Backtraces from Tasks; Nothing, however, is lost with less discomfort than that which, when lost, cannot be missed; Article on Didion; Cost of Living by City; British Slavery; Derrida on Metaphor; African SciFi; Traits in Julia; Alternative Japanese Lit; Pulic Key as Address (Snow); Why Information Grows; The Blindness Of The Chilean Elite; Some Victoriagate Links; This Is Why I Left StackOverflow; New TLS Implementation; Maths for Physicists; How I Am 8; 1000 Word Philosophy; Cyberpunk Reading List; Detailed Discussion of Message Dispatch in ParserCombinator Library for Julia; FizzBuzz in Julia w Dependent Types; kokko - Design Shop in Osaka; Summary of Greece, Currently; LLVM and GPUs; See Also; Schoolgirl Groyps (Maths); Japanese Lit; Another Example - Modular Arithmetic; Music from United; Python 2 and 3 compatible alternative.; Read Agatha Christie for the Plot; A Constructive Look at TempleOS; Music Thread w Many Recommendations; Fixed Version; A Useful Julia Macro To Define Equality And Hash; k3b cdrom access, OpenSuse 13.1; Week 2; From outside, the UK looks less than stellar; Huge Fonts in VirtualBox; Keen - Complex Emergencies; The Fallen of World War II; Some Spanish Fiction; Calling C From Fortran 95; Bjork DJ Set; Z3 Example With Python; Week 1; Useful Guide To Starting With IJulia; UK Election + Media; Review: Reinventing Organizations; Inline Assembly With Julia / LLVM; Against the definition of types; Dumb Crypto Paper; The Search For Quasi-Periodicity...; Is There An Alternative To Processing?; CARDIAC (CARDboard Illustrative Aid to Computation); The Bolivian Case Against Chile At The Hague; Clear, Cogent Economic Arguments For Immigration; A Program To Say If I Am Working; Decent Cards For Ill People; New Photo; Luksic And Barrick Gold; President Bachelet's Speech; Baltimore Primer; libxml2 Parsing Stream; configure.ac Recipe For Library Path; The Davalos Affair For Idiots; Not The Onion: Google Fireside Chat w Kissinger; Bicycle Wheels, Inertia, and Energy; Another Tax Fraud; Google's Borg; A Verion That Redirects To Local HTTP Server; Spanish Accents For Idiots; Aluminium Cans; Advice on Spray Painting; Female View of Online Chat From a Male; UX Reading List; S4 Subgroups - Geometric Interpretation; Fucking Email; The SQM Affair For Idiots; Using Kolmogorov Complexity; Oblique Strategies in bash; Curses Tools; Markov Chain Monte Carlo Without all the Bullshit; Email Para Matias Godoy Mercado; The Penta Affair For Idiots; Example Code To Create numpy Array in C; Good Article on Bias in Graphic Design (NYTimes); Do You Backup github?

© 2006-2015 Andrew Cooke (site) / post authors (content).

Offside Parsing Works in LEPL

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:14:54 -0400

I just got a complete test working for offside (whitespace/indentation
sensitive) parsing working in LEPL (my Python parser -
http://www.acooke.org/lepl)

What follows is a re-formatted version of a test from this file -
http://code.google.com/p/lepl/source/browse/src/lepl/offside/_test/pithon.py?spec=svn362f24c528fa6988e13953eebb1325956295696b&r=362f24c528fa6988e13953eebb1325956295696b


Here's the grammar (note that I have hardly any structure - there's no
clear definition of statements or commands or variables, it's just
enough to use the indentation-aware code):

# these are the basic tokens that the lexer
# recognises - whitespace is then handled
# automatically
word = Token(Word(Lower()))
continuation = Token(r'\\')
symbol = Token(Any('()'))

# the ~ here means these are used to match
# but discarded from the results
introduce = ~Token(':')
comma = ~Token(',')

# first we need to define how a single
# logical line can continue over many
# lines in the text
CLine = CLineFactory(continuation)

# if we don't want lines to continue,
# we could just use the BLine() matcher

# next a minimal language definition that
# says statements are sequence of words
statement = word[1:]

# argument lists can extend over multiple
# lines (the parser will "know" their extent
# because they are inside (...))
args = Extend(word[:, comma]) > tuple

# and a function header is some words followed
# by the argument list
function = \
  word[1:] & ~symbol('(') & args & ~symbol(')')


# now we get to the interesting part.  we
# introduce blocks, which are indented
# relative to the surrounding text
block = Delayed()

# and lines which are what are inside blocks.
# note that a block is a valid line
# because we can nest blocks, and an empty
# line can appear too.  finally we collect
# the output in a Python list so we can
# see the structrue in the result
line = Or(CLine(statement),
          block,
          Line(Empty()))        > list

# now we can define the block: it comes
# after a function header or statement
# (both those end in introduce - ":") and
# contains lines.
block += \
  CLine((function | statement) & introduce) \
  & Block(line[1:])

# and a program is a list of lines.
program = (line[:] & Eos())

# the usual LEPL way to make a parser,
# with a new configuration type. the
# policy argument is the number of spaces
# needed in an indent for a single block.
return program.string_parser(
  OffsideConfiguration(policy=2))


And here's the text that we will parse:

this is a grammar with a similar
line structure to python

if something:
  then we indent
else:
  something else

def function(a, b, c):
  we can nest blocks:
    like this
  and we can also \
    have explicit continuations \
    with \
any \
       indentation

same for (argument,
          lists):
  which do not need the
  continuation marker


Running the parser against that text gives the following, where the
nested lists indicate that we have matcher the block structure
correctly:

[ [],
  ['this', 'is', 'a', 'grammar', 'with', 'a', 'similar'],
  ['line', 'structure', 'to', 'python'],
  [],
  ['if', 'something',
    ['then', 'we', 'indent']],
  ['else',
    ['something', 'else'],
  []],
  ['def', 'function', ('a', 'b', 'c'),
    ['we', 'can', 'nest', 'blocks',
      ['like', 'this']],
    ['and', 'we', 'can', 'also', 'have', 'explicit',
     'continuations', 'with', 'any', 'indentation'],
    []],
  ['same', 'for', ('argument', 'lists'),
    ['which', 'do', 'not', 'need', 'the'],
    ['continuation', 'marker']]]


I hope to release a beta containing this in the next few days, and
will then start working on documentation.  When the docs are done I
will release a new version.

If you want to try this now, you can get the code from the hg repo -
http://code.google.com/p/lepl/source/checkout

Andrew

What's so Neat...

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:44:10 -0400

...about this is that - despite some need to rewrite things - it all
fits into the existing LEPL architecture.  This is a "big deal"
because whitespace parsing mixes information between different levels
of the parser.  The presence of "(...)" or a continuation marker like
"\" influences what the whitespace "means", so while we can detect
indentation in the lexer, we cannot interpret it until the parser
itself is running.  But at the same time, we want to avoid the need to
explicitly add tokens for continuation markers and indentations
"inside" the definitions for statements, expressions etc - the line
structure should be as isolated as possible (imagine having to write a
grammar where between each word you need to include the possibility
that the continuation character appears at that particular point).

Another problem was the "global" state required to handle the current
indentation.  It turns out that LEPL's concept of monitors was a
perfect match for this.

Related to the above was the issue of how to provide a clean,
declarative syntax.  To do this I built on the ideas already
implemented for tokens, and extended streams with filters.  It took a
few iterations, but I am really happy with the final result.

And using LEPL's generic configuration and graph rewriting means that
these new extensions can be integrated with the existing code without
breaking other modules....

I'm *so* pleased this has worked :o)

Andrew

More Offside Documentation

From: andrew cooke <andrew@...>

Date: Wed, 16 Sep 2009 22:20:00 -0400

There's now an initial draft of a new chapter at
http://www.acooke.org/lepl/offside.html

Andrew

Delayed due to State

From: andrew cooke <andrew@...>

Date: Sat, 19 Sep 2009 09:25:55 -0400

Offside support has been delayed slightly because it breaks when used
with memoisation.  This is because (I think) the current indentation
level is not taken into account by memoizers.

Consider the end of a block.  At the end of the block another line is
attempted.  This fails because the indentation is incorrect.  So the
block ends, decrementing the indentation level, and the line is tried
again outside the block.  However, *exactly* the same stream is used
for the line matcher in both cases.  So the second time the memoizer
for the line says "nope, we already know this failed".  When it should
have succeeded, because the indentation level is now correct.

The only clean solution I can see is to introduce the concept of
global (ie per thread) state (a dictionary) in which values (like
current indentation can be stored).  Memoizers then combine the hash
of that state with the hash of the stream to detect repetition.

But will that be sufficient?  What about when two such cases above are
nested?  Will the "inner" case be expanded?  I think so, but am not
100% sure.

Andrew

Comment on this post