Operators

Caveats and Limitations

It is unfortunate, but realistic, that the chapter on operators should start with some warnings to the user.

Operators — things like & and |, used to join matchers — can help produce grammars that are easier to read, easier to understand, and so less likely to contain errors. But their implementation pushes Python’s boundaries, giving problems with precedence and applicability. This is exacerbated by the automatic coercion of strings to Literal() matchers wherever possible.

For example, because operators are effectively methods on neighbouring objects, the following will fail:

>>> name = ('Mr' | 'Ms') // Word()
[...]
TypeError: unsupported operand type(s) for |: 'str' and 'str'

This is because neither 'Mr' nor 'Ms' subclass OperatorMatcher() (which is where | is defined, via __or__ and __ror__).

Another example, where precedence is not as we might hope:

>>> name = ('Mr' // Word() > 'man' | 'Ms' // Word() > 'woman')
[...]
SyntaxError: The operator > for And('Mr', Transform, Transform) was applied to a matcher (Or('man', And)). Check syntax and parentheses.

because the expression is parsed by Python as:

>>> name = ('Mr' // Word()) > ('man' | ('Ms' // Word()) > 'woman')

and the SyntaxError was generated by Lepl, in an attempt to detect this kind of error before the parser is called.

In short, then: use operators with care. Many of the guidelines in the Patterns chapter are intended to help manage these problems.

Binary Operators Between Matchers

Operator Description
& Joins matchers in sequence. The result is a single list containing the results from all functions. Identical (without separators) to And().
+ As &, but the results are then joined together with the standard Python + operator.
/ As &, but with optional spaces (0 or more) between functions. If no space is found, no result is added, otherwise any found spaces are joined together into a single result.
// As &, but with required spaces (1 or more) between functions. The spaces are joined together into a single result.
| Matches one matcher from a list. The result is the result of the chosen matcher. Identical to Or().
% As |, but without backtracking between functions. Identical to First().

For a discussion of backtracking see Search and Backtracking.

Prefix And Postfix Operators On Matchers

Operator Description
~ Discards the result from the matcher. Identical to Drop().
[] Repeats the matcher, with optional concatenation and separator. Identical to (without separators) Repeat() (see previous section).

Operators That Apply Functions To Results

Operator Description
>= Pass the results of the matcher (left) to the given function (right) and use the result as the new result. Identical to Apply(raw=True).
> Pass the results of the matcher (left) to the given function (right) and use the result, within a new list, as the result. If the function is a string a (string, result) pair is generated instead. Identical to Apply().
args Not an operator, but used with > to expand the list of results to be arguments (like Python’s *args convention). For example > args(myFunc) invokes myFunc(*results).
>> As >, but the function is applied to each result in turn (instead of all results being supplied in a single list argument). Identical to Map().
** As >, but the results are passed as the named parameter results. Additional keyword arguments are stream_in (the stream passed to the matcher), stream_out (the stream returned from the matcher) and core (see Resource Management). Identical to KApply().
^ Raise a Syntax error. The argument to the right is a string that is treated as a format template for the same named arguments as **.

Replacement

Operators can be replaced inside a with context using Override():

>>> with Override(or_=And, and_=Or):
>>>     abcd = (Literal('a') & Literal('b')) | ( Literal('c') & Literal('d'))
>>> print(abcd.parse('ac'))
['a', 'c']
>>> print(abcd.parse('ab'))
[...]
lepl.stream.maxdepth.FullFirstMatchException: The match failed in <string> at '' (line 1, character 3).

(think about it).

It is also possible to provide a separator that is used for & and []. With a little care (define matchers for characters before, and matchers for sentences after, the with statement) this can handle the common case of space–separated words in a transparent manner:

>>> word = Letter()[:,...]
>>> with Separator(r'\s+'):
>>>     sentence = word[1:]
>>> sentence.parse('hello world')
['hello', ' ', 'world']

Note that there was no need to specify a separator in word[1:], and that this the argument of Separator() is a rare example of a string being coerced to something other than a Literal() (here Regexp() is used).

The use of separators to handle spaces is discussed in more detail below.

Spaces

There’s a wide variety of ways to handle spaces in Lepl. A large part of the Tutorial is spent discussing this, and it’s probably the best place to look for a basic understanding.

The main conclusion of the Tutorial is that the Lexer (ie using Token()) is the best approach in most circumstances. It usually hits the sweet spot between flexibility and simplicity.

Alternatively, to handle optional spaces (zero or more), without tokens, use DroppedSpace():

with DroppedSpace():
    addition = value & "+" & value

But sometimes these are not the right solution. One case is Tabular Data, when the Columns() matcher is a good fit. Another is when spaces are required.

It is something of a “beginner’s mistake” to enforce the use of spaces in the grammar — it makes the parsing more complex (and more fragile, even to “good” input), and typically doesn’t help the end user much. But even so, it is sometimes necessary.

In such cases, the only real solution is to specify all the spaces by hand. One option is to use the / and // operators (which match zero– and one–or–more spaces respectively). Alternatively, to save typing, Lepl includes various separators (DroppedSpace(), above, is a separator). The Tutorial introduced the basic Separator() (as described in the previous section, above), which requires a user–specified space wherever & is used (and also in [] repetition).

But even this is often not sufficent when optional matchers are used, because the spaces remain even when the optional matcher is ignored.

So, to help automate the (rare) case of required spaces, automatic addition of spaces for each &, and optional matchers, two “smart” separators are also available. The first, SmartSeparator1(), checks whether a matcher is used by seeing whether it consumes input; spaces are only added when & is between two matchers that both “move along” the input stream. The second, SmartSeparator2(), takes a more pro–active approach and examines the matchers to see whether they inherit from the base class used in Lepl to implement “optionality”.

All separators are implemented using operator replacement, described above.

If you really, really need such functionality, the best thing to do is try these various separators and see which has the behaviour you require (but please first consider whether you absolutely need to check that spaces are present, or whether you can do what you want more simply and reliably with the Lexer).

The following tables show the results of some simple tests for different separators, spaces, and functions. They also illustrate two separate, but related, issues: the difference between And() and & when separators are present; and how matchers like Eos() function (which is not optional, but consumes no input).

Optional(‘a’) & Optional(‘b’)
  Separator SmartSeparator1 SmartSeparator2
And(..., Eos()) ... & Eos() And(..., Eos()) ... & Eos() And(..., Eos()) ... & Eos()
‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:]
‘a b ‘     yes yes             yes yes
‘a b’ yes yes   yes yes yes yes yes yes yes   yes
‘ab’   yes   yes   yes   yes   yes   yes
‘ b’ yes yes   yes                
‘b’   yes   yes yes yes yes yes yes yes   yes
‘a ‘ yes yes   yes             yes yes
‘a’   yes   yes yes yes yes yes yes yes   yes
‘’   yes   yes yes yes yes yes yes yes yes yes
‘ ‘ yes yes   yes                

Each table has a “yes” when the parser (at the top of the table) matchers the input stream (on the left). Pay careful attention to spaces in the input.

Different columns of results correspond to the different spearators, whether they are matching a single space or “zero or more” spaces, and whether the final Eos() matcher is added with & (which will include the spaces from the separator) or And() (which won’t).

So, for example, the final column on the right, below, has results for this parser:

with SmartSeparator2(Literal(' ')[:]):
    parser = Optional('a') & Optional('b') & 'c' & Eos()

(where Literal( ) is missing from the column heading to save space).

Optional(‘a’) & Optional(‘b’) & ‘c’
  Separator SmartSeparator1 SmartSeparator2
And(..., Eos()) ... & Eos() And(..., Eos()) ... & Eos() And(..., Eos()) ... & Eos()
‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:] ‘ ‘ ‘ ‘[:]
‘a b c ‘     yes yes             yes yes
‘a b c’ yes yes   yes yes yes yes yes yes yes   yes
‘ b c’ yes yes   yes                
‘b c’   yes   yes yes yes yes yes yes yes   yes
‘ab c’   yes   yes   yes   yes   yes   yes
‘a c’   yes   yes yes yes yes yes yes yes   yes
‘a c’ yes yes   yes   yes   yes   yes   yes
‘c’   yes   yes yes yes yes yes yes yes   yes
‘ c’   yes   yes                

Finally, note that offside (significant whitespace) parsing is only supported with tokens. If you want to do it without, you need to somehow work out how to track the level and match the spaces yourself.

Table Of Contents

Previous topic

Matchers

Next topic

Results

This Page