# C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2015 Andrew Cooke (site) / post authors (content).

## 40-fold Speedup in LEPL Parsing

From: "andrew cooke" <andrew@...>

Date: Fri, 6 Mar 2009 19:51:54 -0300 (CLST)

OK, so this was a somewhat extreme test case, but by rearranging the
grammar I can make it more likely to terminate and so avoid lots of
pointless recursion.  Here's the code that does the rewriting - I'm
pleased at how simple it is, after all the work making matchers graphs,
etc:

def optimize_or(graph):
'''
When a left-recursive rule is used, it is much more efficient
if it appears last in an Or statement, since that forces the
alternates (which correspond to the terminating case in a
recursive function) to be tested before the LMemo limit is
reached.

This rewriting may change the order in which different results
for an ambiguous grammar are returned.
'''
for delayed in [x for x in preorder(graph)
if type(x) is Delayed]:
for loop in loops(delayed):
for i in range(len(loop)):
if isinstance(loop[i], Or):
# we cannot be at the end of the list here,
# since that is a Delayed instance
matchers = loop[i].matchers
target = loop[i+1]
# move target to end of list
index = matchers.index(target)
del matchers[index]
matchers.append(target)
return graph

Even on a more reasonable case (the LMemo example in the docs), the
speedup is about a factor of 2 (that includes some additional tweaks
elsewhere).

It's surprising how uniform the profiles are (the flip side of that is
that there's little to optimise from a code-tweaking pov).  The only other
improvement I can think of at the moment is adding a tokenizer (which will
Andrew