Lepl

Return home.

Contents

Lepl was a pure-Python, recursive descent parser. The first release was made in January 2009. Release 5.0.0 was installed over 4,000 times (PyPI statistics). Although still available, it has not been supported since June 2012.

Motivation

I had been interested in recursive descent parsing since reading Cousineau & Mauny, but felt that Python’s limited stack size (since increased) made it a poor choice for that language. This changed with the introduction of enhanced generators, which could be used to implement trampolining.

At the same time, I was disappointed with the existing pure-Python parsers (particularly pyparsing which seemed inconsistent and inelegant). So I decided to write a parser that would:

• be unlimited by stack use;

• parse any Python sequence (including lists and binary data);

• exploit caching (packrat parsing) and regular expressions for efficiency;

• (ab)use Python’s operator support to give an “embedded grammar”.

Results

The final library was open source with extensive documentation, including a manual, tutorial, and API.

This example shows Lepl at its best.

``````>>> class Factor(List): pass
>>> class Expression(List): pass

>>> expr   = Delayed()
>>> number = Digit()[1:,...]

>>> with DroppedSpace():
>>>     term    = number | '(' & expr & ')'
>>>     muldiv  = Any('*/')
>>>     factor  = term & (muldiv & term)[:]         > Factor
>>>     expr   += factor & (addsub & factor)[:]     > Expression
>>>     line    = expr & Eos()

>>> ast = line.parse_string('1 + 2 * (3 + 4 - 5)')[0]
>>> print(ast)
Expression
+- Factor
|   `- '1'
+- '+'
`- Factor
+- '2'
+- '*'
+- '('
+- Expression
|   +- Factor
|   |   `- '3'
|   +- '+'
|   +- Factor
|   |   `- '4'
|   +- '-'
|   `- Factor
|       `- '5'
`- ')'``````

Lessons Learned

Positive

The “embedded grammar” worked well, as did some more experimental ideas, like automatic tracing of variables (via frame introspection) and (mis-)use of the array syntax to specify how often a pattern should repeat.

Negative

Lepl failed on two related fronts:

1. It was too complex: working well (eg. providing useful error messages that include location) with arbitrary sequences is hard; trampolining in Python is not as transparent as it could be; adding caching to trampolines is hard. And I didn’t find simple, clean solutions to the hard problems.

2. It was not popular enough. Partly, because pyparsing has already become dominant. Partly because the code was too complex for other people to start hacking. And partly because I am not so good at encouraging and supporting a community.