.. _operators: Operators ========= .. index:: unsupported operand type, operators, OperatorMatcher, SyntaxError .. _caveatsandlimitations: Caveats and Limitations ----------------------- It is unfortunate, but realistic, that the chapter on operators should start with some warnings to the user. Operators --- things like ``&`` and ``|``, used to join matchers --- can help produce grammars that are easier to read, easier to understand, and so less likely to contain errors. But their implementation pushes Python's boundaries, giving problems with precedence and applicability. This is exacerbated by the automatic coercion of strings to `Literal() `_ matchers wherever possible. For example, because operators are effectively methods on *neighbouring objects*, the following will fail:: >>> name = ('Mr' | 'Ms') // Word() [...] TypeError: unsupported operand type(s) for |: 'str' and 'str' This is because neither ``'Mr'`` nor ``'Ms'`` subclass `OperatorMatcher() `_ (which is where ``|`` is defined, via ``__or__`` and ``__ror__``). Another example, where precedence is not as we might hope:: >>> name = ('Mr' // Word() > 'man' | 'Ms' // Word() > 'woman') [...] SyntaxError: The operator > for And('Mr', Transform, Transform) was applied to a matcher (Or('man', And)). Check syntax and parentheses. because the expression is parsed by Python as:: >>> name = ('Mr' // Word()) > ('man' | ('Ms' // Word()) > 'woman') and the SyntaxError was generated by Lepl, in an attempt to detect this kind of error before the parser is called. In short, then: use operators with care. Many of the guidelines in the :ref:`style` chapter are intended to help manage these problems. .. index:: &, +, /, //, |, % Binary Operators Between Matchers --------------------------------- ======== =========== Operator Description ======== =========== ``&`` Joins matchers in sequence. The result is a single list containing the results from all functions. Identical (without separators) to `And() `_. -------- ----------- ``+`` As ``&``, but the results are then joined together with the standard Python ``+`` operator. -------- ----------- ``/`` As ``&``, but with optional spaces (0 or more) between functions. If no space is found, no result is added, otherwise any found spaces are joined together into a single result. -------- ----------- ``//`` As ``&``, but with required spaces (1 or more) between functions. The spaces are joined together into a single result. -------- ----------- ``|`` Matches one matcher from a list. The result is the result of the chosen matcher. Identical to `Or() `_. -------- ----------- ``%`` As ``|``, but without backtracking between functions. Identical to `First() `_. ======== =========== For a discussion of backtracking see :ref:`backtracking`. .. index:: ~, [] Prefix And Postfix Operators On Matchers ---------------------------------------- ======== =========== Operator Description ======== =========== ``~`` Discards the result from the matcher. Identical to `Drop() `_. -------- ----------- ``[]`` Repeats the matcher, with optional concatenation and separator. Identical to (without separators) `Repeat() `_ (see :ref:`previous section `). ======== =========== .. note: `Lookahead() `_ is an exception for ``~`` (see :ref:`lookahead`). .. index:: >=, >, >>, **, ^, args() .. _ge: Operators That Apply Functions To Results ----------------------------------------- ======== =========== Operator Description ======== =========== ``>=`` Pass the results of the matcher (left) to the given function (right) and use the result as the new result. Identical to `Apply(raw=True) `_. -------- ----------- ``>`` Pass the results of the matcher (left) to the given function (right) and use the result, *within a new list*, as the result. If the function is a string a ``(string, result)`` pair is generated instead. Identical to `Apply() `_. -------- ----------- ``args`` Not an operator, but used with ``>`` to expand the list of results to be arguments (like Python's ``*args`` convention). For example ``> args(myFunc)`` invokes ``myFunc(*results)``. -------- ----------- ``>>`` As ``>``, but the function is applied to each result in turn (instead of all results being supplied in a single list argument). Identical to `Map() `_. -------- ----------- ``**`` As ``>``, but the results are passed as the named parameter *results*. Additional keyword arguments are *stream_in* (the stream passed to the matcher), *stream_out* (the stream returned from the matcher) and *core* (see :ref:`resources`). Identical to `KApply() `_. -------- ----------- ``^`` Raise a Syntax error. The argument to the right is a string that is treated as a format template for the same named arguments as ``**``. ======== =========== .. _replacement: Replacement ----------- Operators can be replaced inside a ``with`` context using `Override() `_:: >>> with Override(or_=And, and_=Or): >>> abcd = (Literal('a') & Literal('b')) | ( Literal('c') & Literal('d')) >>> print(abcd.parse('ac')) ['a', 'c'] >>> print(abcd.parse('ab')) [...] lepl.stream.maxdepth.FullFirstMatchException: The match failed in at '' (line 1, character 3). (think about it). It is also possible to provide a separator that is used for ``&`` and ``[]``. With a little care (define matchers for characters before, and matchers for sentences after, the *with* statement) this can handle the common case of space--separated words in a transparent manner: >>> word = Letter()[:,...] >>> with Separator(r'\s+'): >>> sentence = word[1:] >>> sentence.parse('hello world') ['hello', ' ', 'world'] Note that there was no need to specify a separator in ``word[1:]``, and that this the argument of `Separator() `_ is a rare example of a string being coerced to something other than a `Literal() `_ (here `Regexp() `_ is used). The use of separators to handle spaces is discussed in more detail below. .. index:: Separator(), SmartSeparator1(), SmartSeparator2(), DroppedSpace() .. _spaces: Spaces ------ There's a wide variety of ways to handle spaces in Lepl. A large part of the :ref:`Tutorial ` is spent discussing this, and it's probably the best place to look for a basic understanding. The main conclusion of the :ref:`Tutorial ` is that the :ref:`lexer` (ie using `Token() `_) is the best approach in most circumstances. It usually hits the sweet spot between flexibility and simplicity. Alternatively, to handle optional spaces (zero or more), without tokens, use `DroppedSpace() `_:: with DroppedSpace(): addition = value & "+" & value But sometimes these are not the right solution. One case is :ref:`table_example`, when the `Columns() `_ matcher is a good fit. Another is when spaces are *required*. It is something of a "beginner's mistake" to enforce the use of spaces in the grammar --- it makes the parsing more complex (and more fragile, even to "good" input), and typically doesn't help the end user much. But even so, it is sometimes necessary. In such cases, the only real solution is to specify all the spaces by hand. One option is to use the ``/`` and ``//`` operators (which match zero-- and one--or--more spaces respectively). Alternatively, to save typing, Lepl includes various *separators* (`DroppedSpace() `_, above, is a separator). The :ref:`Tutorial ` introduced the basic `Separator() `_ (as described in the previous section, above), which requires a user--specified space wherever `&` is used (and also in `[]` repetition). But even this is often not sufficent when optional matchers are used, because the spaces remain even when the optional matcher is ignored. So, to help automate the (rare) case of *required* spaces, *automatic* addition of spaces for each `&`, and *optional* matchers, two "smart" separators are also available. The first, `SmartSeparator1() `_, checks whether a matcher is used by seeing whether it consumes input; spaces are only added when `&` is between two matchers that both "move along" the input stream. The second, `SmartSeparator2() `_, takes a more pro--active approach and examines the matchers to see whether they inherit from the base class used in Lepl to implement "optionality". All separators are implemented using :ref:`operator replacement `, described above. If you really, really need such functionality, the best thing to do is try these various separators and see which has the behaviour you require (but please first consider whether you absolutely need to check that spaces are present, or whether you can do what you want more simply and reliably with the :ref:`lexer`). The following tables show the results of some simple tests for different separators, spaces, and functions. They also illustrate two separate, but related, issues: the difference between `And() `_ and ``&`` when separators are present; and how matchers like `Eos() `_ function (which is not optional, but consumes no input). +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |Optional('a') & Optional('b') | +----------+-----------------------------------------------------------+-----------------------------------------------------------+-----------------------------------------------------------+ | |Separator |SmartSeparator1 |SmartSeparator2 | | +-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ | |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() | | +--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ | |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] | +==========+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+ |'a b ' | | |yes |yes | | | | | | |yes |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'a b' |yes |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'ab' | |yes | |yes | |yes | |yes | |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |' b' |yes |yes | |yes | | | | | | | | | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'b' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'a ' |yes |yes | |yes | | | | | | |yes |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'a' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'' | |yes | |yes |yes |yes |yes |yes |yes |yes |yes |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |' ' |yes |yes | |yes | | | | | | | | | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ Each table has a "yes" when the parser (at the top of the table) matchers the input stream (on the left). Pay careful attention to spaces in the input. Different columns of results correspond to the different spearators, whether they are matching a single space or "zero or more" spaces, and whether the final `Eos() `_ matcher is added with ``&`` (which will include the spaces from the separator) or `And() `_ (which won't). So, for example, the final column on the right, below, has results for this parser:: with SmartSeparator2(Literal(' ')[:]): parser = Optional('a') & Optional('b') & 'c' & Eos() (where `Literal( ) `_ is missing from the column heading to save space). +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |Optional('a') & Optional('b') & 'c' | +----------+-----------------------------------------------------------+-----------------------------------------------------------+-----------------------------------------------------------+ | |Separator |SmartSeparator1 |SmartSeparator2 | | +-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ | |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() | | +--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ | |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] | +==========+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+ |'a b c ' | | |yes |yes | | | | | | |yes |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'a b c' |yes |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |' b c' |yes |yes | |yes | | | | | | | | | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'b c' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'ab c' | |yes | |yes | |yes | |yes | |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'a c' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'a c' |yes |yes | |yes | |yes | |yes | |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |'c' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ |' c' | |yes | |yes | | | | | | | | | +----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+ Finally, note that offside (significant whitespace) parsing is only supported with tokens. If you want to do it without, you need to somehow work out how to track the level and match the spaces yourself.