Package lepl :: Package regexp :: Module core
[hide private]
[frames] | no frames]

Module core

source code

A simple regular expression engine in pure Python.

This takes a set of regular expressions (only the most basic functionality is supported) and generates a finite state machines that can match them against a stream of values.

Although simple (and slow compared to a C version), it has some advantages from being implemented in Python.

The classes here form a layered series of representations for regular expressions.

The first layer contains the classes Sequence, Option, Repeat, Choice, Labelled and Regexp. These are a high level representation that will typically be constructed by an alphabet-specific parser.

The second layer encodes the regular expression as a non-deterministic finite automaton (NFA) with empty transitions. This is a "natural" representation of the transitions that can be generated by the first layer.

The third layer encodes the regular expression as a deterministic finite automaton (DFA). This is a more "machine friendly" representation that allows for more efficient matching. It is generated by transforming a representation from the second layer.

Classes [hide private]
Regular expressions are generalised over alphabets, which describe the set of acceptable characters.
A sequence of Characters, etc.
An optional sequence of Characters (or sequences).
Matches an empty sequence.
A sequence of Characters (or sequences) that can repeat 0 or more times.
A set of alternative Characters (or sequences).
A labelled sequence.
An error associated with (problems in) the regexp implementation.
Compile an expression.
Describes a collection of connected nodes.
Describes a NFA with epsilon (empty) transitions.
Given a graph this constructs a transition table and an associated matcher.
Describes a DFA where each node is a collection of NFA nodes.
Convert a NFA graph to a DFA graph (uses the usual superset approach but does combination of states in a way that seems to fit better with the idea of character ranges).
Create a lookup table for a DFA and a matcher to evaluate it.
Functions [hide private]
Choice(alphabet, *children)
Encase a choice in a sequence so that it can be treated in the same way as other components (if we don't do this, then passing it to another sequence will split it into constituents and then sequence them).
source code
Variables [hide private]
  _Alphabet = ABCMeta('_Alphabet', (object,), {})
  RegexpGraphNode = ABCMeta('RegexpGraphNode', (object,), {})