Package lepl :: Package core :: Module config :: Class ConfigBuilder
[hide private]
[frames] | no frames]

Class ConfigBuilder

source code


Accumulate configuration through chained methods.
Instance Methods [hide private]
 
__init__(self, matcher)
x.__init__(...) initializes x; see help(type(x)) for signature
source code
 
__start(self)
Set default values on demand to avoid dependency loop.
source code
 
add_rewriter(self, rewriter)
Add a rewriter that will be applied to the matcher graph when the parser is generated.
source code
 
remove_rewriter(self, rewriter)
Remove a rewriter from the current configuration.
source code
 
remove_all_rewriters(self, type_=None)
Remove all rewriters of a given type from the current configuration.
source code
 
add_monitor(self, monitor)
Add a monitor to the current configuration.
source code
 
remove_all_monitors(self)
Remove all monitors from the current configuration.
source code
 
stream_factory(self, stream_factory=DEFAULT_STREAM_FACTORY)
Specify the stream factory.
source code
 
add_stream_kargs(self, **kargs)
Add a value for passing to the stream factory.
source code
 
remove_all_stream_kargs(self)
Remove all values passed to the stream factory.
source code
 
configuration(self)
The current configuration (rewriters, monitors, stream_factory).
source code
 
__get_alphabet(self)
Get the alphabet used.
source code
 
alphabet(self, alphabet)
Set the alphabet used.
source code
 
changed(self)
Has the config been changed by the user since it was last returned via configuration? if not, any previously generated parser can be reused.
source code
 
clear_cache(self)
Force calculation of a new parser.
source code
 
set_arguments(self, type_, **kargs)
Set the given keyword arguments on all matchers of the given type_ (ie class) in the grammar.
source code
 
no_set_arguments(self)
Remove all rewriters that set arguments.
source code
 
set_alphabet_arg(self, alphabet=None)
Set alphabet on various matchers.
source code
 
full_first_match(self, eos=True)
Raise an error if the first match fails.
source code
 
no_full_first_match(self)
Disable the automatic generation of an error if the first match fails.
source code
 
flatten(self)
Combined nested And() and Or() matchers.
source code
 
no_flatten(self)
Disable the combination of nested And() and Or() matchers.
source code
 
compile_to_dfa(self, force=False, alphabet=None)
Compile simple matchers to DFA regular expressions.
source code
 
compile_to_nfa(self, force=False, alphabet=None)
Compile simple matchers to NFA regular expressions.
source code
 
compile_to_re(self, force=False, alphabet=None)
Compile simple matchers to re (C library) regular expressions.
source code
 
no_compile_to_regexp(self)
Disable compilation of simple matchers to regular expressions.
source code
 
optimize_or(self, conservative=False)
Rearrange arguments to Or() so that left-recursive matchers are tested last.
source code
 
no_optimize_or(self)
Disable the re-ordering of some Or() arguments.
source code
 
lexer(self, alphabet=None, discard=None, lexer=None)
Detect the use of Token() and modify the parser to use the lexer.
source code
 
no_lexer(self)
Disable support for the lexer.
source code
 
direct_eval(self, spec=None)
Combine simple matchers so that they are evaluated without trampolining.
source code
 
no_direct_eval(self)
Disable direct evaluation.
source code
 
compose_transforms(self)
Combine transforms (functions applied to results) with matchers.
source code
 
no_compose_transforms(self)
Disable the composition of transforms.
source code
 
auto_memoize(self, conservative=None, full=True, d=0)
This configuration attempts to detect which memoizer is most effective for each matcher.
source code
 
left_memoize(self, d=0)
Add memoization that may detect and stabilise left-recursion.
source code
 
right_memoize(self)
Add memoization that can make some complex parsers (with a lot of backtracking) more efficient.
source code
 
no_memoize(self)
Remove memoization.
source code
 
lines(self, discard=None, tabsize=8, block_policy=None, block_start=None)
Configure "offside parsing".
source code
 
trace_stack(self, enabled=False)
Add a monitor to trace results using TraceStack().
source code
 
trace_variables(self)
Add a monitor to correctly insert the transforms needed when using the TraceVariables() context:
source code
 
low_memory(self, queue_len=100)
Reduce memory use (at the expense of backtracking).
source code
 
cache_level(self, level=1)
Control when the stream can be cached internally (this is used for debugging and error messages) - streams are cached for debugging when the value is greater than zero.
source code
 
record_deepest(self, n_before=6, n_results_after=2, n_done_after=2)
Add a monitor to record deepest match.
source code
 
clear(self)
Delete any earlier configuration and disable the default (so no rewriters or monitors are used).
source code
 
default(self)
Provide the default configuration (deleting what may have been configured previously).
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, matcher)
(Constructor)

source code 
x.__init__(...) initializes x; see help(type(x)) for signature
Overrides: object.__init__
(inherited documentation)

add_monitor(self, monitor)

source code 
Add a monitor to the current configuration. Monitors are called from within the trampolining process and can be used to track evaluation, control resource use, etc.

stream_factory(self, stream_factory=DEFAULT_STREAM_FACTORY)

source code 
Specify the stream factory. This is used to generate the input stream for the parser.

configuration(self)

source code 
The current configuration (rewriters, monitors, stream_factory).
Decorators:
  • @property

__get_alphabet(self)

source code 

Get the alphabet used.

Typically this is Unicode, which is the default. It is needed for the generation of regular expressions.

alphabet(self, alphabet)

source code 
Set the alphabet used. It is needed for the generation of regular expressions, for example (but the default, for Unicode, is usually sufficient).

changed(self)

source code 
Has the config been changed by the user since it was last returned via configuration? if not, any previously generated parser can be reused.
Decorators:
  • @property

set_alphabet_arg(self, alphabet=None)

source code 
Set alphabet on various matchers. This is useful when using an unusual alphabet (most often when using line-aware parsing), as it saves having to specify it on each matcher when creating the grammar.

full_first_match(self, eos=True)

source code 

Raise an error if the first match fails. If eos is True then this requires that the entire input is matched, otherwise it only requires that the matcher succeed. The exception includes information about the deepest read to the stream (which is a good indication of where any error occurs).

This is part of the default configuration. It can be removed with no_full_first_match().

flatten(self)

source code 

Combined nested And() and Or() matchers. This does not change the parser semantics, but improves efficiency.

This is part of the default configuration. It can be removed with no_flatten.

compile_to_dfa(self, force=False, alphabet=None)

source code 
Compile simple matchers to DFA regular expressions. This improves efficiency but may change the parser semantics slightly (DFA regular expressions do not provide backtracking / alternative matches).

compile_to_nfa(self, force=False, alphabet=None)

source code 

Compile simple matchers to NFA regular expressions. This improves efficiency and should not change the parser semantics.

This is part of the default configuration. It can be removed with no_compile_regexp.

compile_to_re(self, force=False, alphabet=None)

source code 
Compile simple matchers to re (C library) regular expressions. This improves efficiency but may change the parser semantics slightly (DFA regular expressions do not provide backtracking / alternative matches).

optimize_or(self, conservative=False)

source code 

Rearrange arguments to Or() so that left-recursive matchers are tested last. This improves efficiency, but may alter the parser semantics (the ordering of multiple results with ambiguous grammars may change).

conservative refers to the algorithm used to detect loops; False may classify some left--recursive loops as right--recursive.

lexer(self, alphabet=None, discard=None, lexer=None)

source code 

Detect the use of Token() and modify the parser to use the lexer. If tokens are not used, this has no effect on parsing.

This is part of the default configuration. It can be disabled with no_lexer.

direct_eval(self, spec=None)

source code 

Combine simple matchers so that they are evaluated without trampolining. This improves efficiency (particularly because it reduces the number of matchers that can be memoized).

This is part of the default configuration. It can be removed with no_direct_eval.

compose_transforms(self)

source code 

Combine transforms (functions applied to results) with matchers. This may improve efficiency.

This is part of the default configuration. It can be removed with no_compose_transforms.

auto_memoize(self, conservative=None, full=True, d=0)

source code 

This configuration attempts to detect which memoizer is most effective for each matcher. As such it is a general "fix" for left-recursive grammars and is suggested in the warning shown when the right-only memoizer detects left recursion.

Lepl does not guarantee that all left-recursive grammars are handled correctly. The corrections applied may be incomplete and can be inefficient. It is always better to re-write a grammar to avoid left-recursion. One way to improve efficiency, at the cost of less accurate matching, is to specify a non-zero d parameter - this is the maximum iteration depth that will be used (by default, when d is zero, it is the length of the remaining input, which can be very large).

left_memoize(self, d=0)

source code 

Add memoization that may detect and stabilise left-recursion. This makes the parser more robust (so it can handle more grammars) but also more complex (and probably slower).

config.auto_memoize() will also add memoization, but will select left/right memoization depending on the path through the parser.

Lepl does not guarantee that all left-recursive grammars are handled correctly. The corrections applied may be incomplete and can be inefficient. It is always better to re-write a grammar to avoid left-recursion. One way to improve efficiency, at the cost of less accurate matching, is to specify a non-zero d parameter - this is the maximum iteration depth that will be used (by default, when d is zero, it is the length of the remaining input, which can be very large).

right_memoize(self)

source code 

Add memoization that can make some complex parsers (with a lot of backtracking) more efficient. This also detects left-recursive grammars and displays a suitable warning.

This is included in the default configuration. For simple grammars it may make things slower; it can be disabled by config.no_memoize().

no_memoize(self)

source code 
Remove memoization. To use the default configuration without memoization (which may be faster in some cases), specify config.no_memoize().

lines(self, discard=None, tabsize=8, block_policy=None, block_start=None)

source code 

Configure "offside parsing". This enables lexing and adds extra tokens to mark the start and end of lines. If block_policy is specified then the line start token will also include spaces which can be used by the Block() and BLine() matchers to do offside (whitespace-aware) parsing.

discard is the regular expression to use to identify spaces between tokens (by default, spaces and tabs).

The remaining parameters are used only if at least one of block_policy and block_start is given.

block_policy decides how indentation if calculated. See explicit etc in lepl.lexer.blocks.matchers.

block_start is the initial indentation (by default, zero). If set to lepl.lexer.lines.matchers.NO_BLOCKS indentation will not be checked (useful for tests).

tabsize is used only if block_policy is given. It is the number of spaces used to replace a leading tab (no replacement if None).

trace_stack(self, enabled=False)

source code 

Add a monitor to trace results using TraceStack().

This is not used by default as it has a cost at runtime.

trace_variables(self)

source code 

Add a monitor to correctly insert the transforms needed when using the TraceVariables() context:

with TraceVariables():
...

This is used by default as it has no runtime cost (once the parser is created).

low_memory(self, queue_len=100)

source code 

Reduce memory use (at the expense of backtracking).

This will:
- Add a monitor to manage resources.  See `GeneratorManager()`.
- Disable direct evaluation (more trampolining gives more scope
  for removing generators)
- Disable the full first match error (which requires a copy of the
  input for the error message)
- Disable memoisation (which would keep input in memory)

This reduces memory usage, but makes the parser less reliable.
Usually a value like 100 (the default) for the queue length will make 
memory use insignificant and still give a useful first parse.

Note that, although the parser will use less memory, it may run
more slowly (as extra work needs to be done to "clean out" the 
stored values).

cache_level(self, level=1)

source code 

Control when the stream can be cached internally (this is used for debugging and error messages) - streams are cached for debugging when the value is greater than zero. The value is incremented each time a new stream is constructed (eg when constructing tokens).

A value of 1 implies that a stream would be always cached. A value of 0 might be used when iterating over a file with the lexer - the iteration is not cached, but individual tokens will be.

record_deepest(self, n_before=6, n_results_after=2, n_done_after=2)

source code 
Add a monitor to record deepest match. See RecordDeepest().

default(self)

source code 
Provide the default configuration (deleting what may have been configured previously). This is equivalent to the initial configuration. It provides a moderately efficient, stable parser.