Matchers

The API Documentation contains an exhaustive list of the matches available. This chapter only describes the most important.

The final section gives some implementation details.

Literal

[API] This matcher identifies a given string. For example, Literal('hello') will give the result “hello” when that text is at the start of a stream:

>>> Literal('hello').parse_string('hello world')
['hello']

In many cases it is not necessary to use Literal() explicitly. Most matchers, when they receive a string as a constructor argument, will automatically create a literal match from the given text.

Any

[API] This matcher identifies any single character. It can be restricted to match only characters that appear in a given string. For example:

>>> Any().parse_string('hello world')
['h']

>>> Any('abcdefghijklm')[0:].parse_string('hello world')
['h', 'e', 'l', 'l']

And (&)

[API] This matcher combines other matchers in order. For example:

>>> And(Any('h'), Any()).parse_string('hello world')
['h', 'e']

All matchers must succeed for And() as a whole to succeed:

>>> And(Any('h'), Any('x')).parse_string('hello world')
None
>>> (Any('h') & Any('e')).parse_string('hello world')
['h', 'e']

The last example above shows how & can also be used — it is equivalent unless a separator is being used.

Or (|)

[API] This matcher searches through a list of other matchers to find a successful match. For example:

>>> Or(Any('x'), Any('h'), Any('z')).parse_string('hello world')
['h']

The first match found is the one returned:

>>> Or(Any('h'), Any()[3]).parse_string('hello world')
['h']

But with the “match” rather than the “parse” methods, subsequent calls return the other possibilities:

>>> matcher = Or(Any('h'), Any()[3]).match('hello world')
>>> next(matcher)
(['h'], 'ello world')
>>> next(matcher)
(['h', 'e', 'l'], 'lo world')

This shows how Or() supports “backtracking”. LEPL may call a matcher times before a result is found that “fits” with the rest of the grammar.

Repeat ([...])

[API] This matcher repeats another matcher a given number of times. The second and third arguments are the minimum and maximum number of times that the matcher must repeat (these are the first two indices of the related [] operator).

For example:

>>> Repeat(Any(), 3, 3).parse_string('12345')
['1', '2', '3']
>>> Any()[3:3].parse_string('12345')
['1', '2', '3']

If only a lower bound to the number of repeats is given the match will be repeated as often as possible:

>>> Repeat(Any(), 3).parse_string('12345')
['1', '2', '3', '4', '5']
>>> Any()[3:].parse_string('12345')
['1', '2', '3', '4', '5']

If the match cannot be repeated the requested number of times no result is returned:

>>> Repeat(Any(), 3).parse_string('12')
None

When used with the “match” methods, different numbers of matches are available on subsequent calls (backtracking):

>>> matcher = Repeat(Any(), 3).match('12345')
>>> next(matcher)
(['1', '2', '3', '4', '5'], '')
>>> next(matcher)
(['1', '2', '3', '4'], '5')
>>> next(matcher)
(['1', '2', '3'], '45')
>>> next(matcher)
StopIteration

By default a depth–first search is used (giving the longest match first). Specifying a fourth argument (the increment when used with [] syntax) of 'b' gives breadth–first search (shortest first):

>>> matcher = Repeat(Any(), 3, None, 'b').match('12345')
>>> next(matcher)
(['1', '2', '3'], '45')
>>> next(matcher)
(['1', '2', '3', '4'], '5')
>>> next(matcher)
(['1', '2', '3', '4', '5'], '')
>>> next(matcher)
StopIteration

Lookahead

[API] This matcher checks whether another matcher would succeed, but returns the original stream with an empty result list.

>>> next(Lookahead(Literal('hello')).match('hello world'))
([], 'hello world')
>>> Lookahead(Literal('hello')).parse('hello world')
[]

It fails if the match would not be possible (specifying a string as matcher is equivalent to using Literal()):

>>> Lookahead('hello').parse_string('goodbye cruel world')
None

When preceded by a ~ the logic is reversed:

>>> (~Lookahead('hello')).parse_string('hello world')
None
>>> (~Lookahead('hello')).parse_string('goodbye cruel world')
[]

Note

Because ~ binds less strongly than method invocation extra parentheses are needed above.

Note

This change in behaviour is specific to Lookahead() — usually ~ applies Drop() as described below.

Drop (~)

[API] This matcher calls another matcher, but discards the results:

>>> (Drop('hello') / 'world').parse_string('hello world')
[' ', 'world']
>>> (~Literal('hello') / 'world').parse_string('hello world')
[' ', 'world']

(The empty string in the result is from / which joins two matchers together, with optional spaces between).

This is different to Lookahead() because the matcher after Drop() receives a stream that has “moved on” to the next part of the input. With Lookahead() the stream is not advanced and so this example will fail:

>>> (Lookahead('hello') / 'world').parse_string('hello world')
None

Apply (>, >=, args)

[API] This matcher passes the results of another matcher to a function, then returns the value from the function as a new result:

>>> def show(results):
...     print('results:', results)
...     return results
>>> Apply(Any()[:,...], show).parse_string('hello world')
results: ['hello world']
[['hello world']]

The > operator is equivalent:

>>> (Any()[:,...] > show).parse_string('hello world')
results: ['hello world']
[['hello world']]

The returned result is placed in a new list, which is not always what is wanted (it is useful when you want Nested Lists); setting raw=True uses the result directly:

>>> Apply(Any()[:,...], show, raw=True).parse_string('hello world')
results: ['hello world']
['hello world']
>>> (Any()[:,...] >= show).parse_string('hello world')
results: ['hello world']
['hello world']

Setting another optional argument, args, to True changes the way the function is called. Instead of passing the results as a single list each is treated as a separate argument. This is familiar as the way *args works in Python:

>>> def format3(a, b, c):
...   return 'a: {0}; b: {1}; c: {2}'.format(a, b, c)
>>> Apply(Any()[3], format3, args=True).parse('xyz')
['a: x; b: y; c: z']

There’s no operator equivaluent for this, but a little helper function called args() allows > to be reused:

>>> (Any()[3] > args(format3)).parse('xyz')
['a: x; b: y; c: z']

KApply (**)

[API] This matcher passes the results of another matcher to a function, along with additional information about the match, then returns the value from the function as a new result. Unlike Apply(), this names the arguments as follows:

stream_in
The stream passed to the matcher before matching.
stream_out
The stream returned from the matcher after matching.
results
A list of the results returned.

Implementation Details

All matchers accept a stream of data and return a generator. The generator will supply a sequence of ([results], stream) pairs, where results depends on the matcher and the new stream continues from after the matched text [*].

A matcher may succeed, but provide no results — the generator will return a tuple containing an empty list and the new stream. When there are no more possible matches, the generator will exit.

Most simple matchers will return a generator that yields a single value. Generators that return multiple values are used in backtracking. For example, the Or() generator may yield once for each sub–match in turn (in practice some sub-matchers may return generators that themselves return many values, while others may fail immediately, so it is not a direct 1–to–1 correspondence).

(It is probably obvious if you have used combinator libraries before, but worth mentioning anyway: all matchers implement this same interface, whether they are “fundamental” — do the real work of matching against the stream — or delegate work to other sub–matchers, or modify results. This consistency is a source of great expressive power.)

Existing matchers take care to exploit the common interface between lists and strings, so matching should work on a variety of streams, including inhomogeneous lists of objects.

All matcher implementations should subclass the ABC Matcher. Most will do so by inheriting from BaseMatcher which provides support for operators.

[*]I am intentionally omitting details about trampolining here to focus on the process of matching. A more complete description of the entire implementation can be found in Trampolining.

Table Of Contents

Previous topic

Getting Started

Next topic

Operators

This Page