# C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2015 Andrew Cooke (site) / post authors (content).

## RXPY Update - Beam Engine

From: andrew cooke <andrew@...>

Date: Fri, 16 Jul 2010 14:07:59 -0400

A short update on the pure-Python RXPY regular expression engine.

I just got the beam-width engine working.  This uses the "re2" approach, in
a very general sense, but differs in that:
- all state is tracked (so nothing is excluded from regular expressions)
- the maximum number of states is restricted using a "beam search" approach

The aim has been to investigate different high-level, full featured
approaches, before worrying about low-level optimisations.  And the latest
results are pretty cool - all my stress tests are handled nicely by the beam
search approach.  I'm sure I could construct problem cases, but this is
already better than a backtracking approach.

Next step is to define a set of benchmarks and formally compare the different
approaches.  After that, with uncertain priority, I need to (a) start
discarding features for speed and (b) look at pypy.

I was going to end there, but realised I haven't explained what I mean by
"beam search".  The idea is that "re2" accumulates different states,
effectively doing NFA->DFA conversion "on the fly", but that the number of
states is restricted (to 1,2,4,8... states).  This reduces the time spent
searching for (possibly exponential?) alternatives in most normal cases.

Andrew