Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

© 2006-2017 Andrew Cooke (site) / post authors (content).

Avoiding the Python Stack

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 09:17:38 -0300 (CLST)

The Python stack is notoriously restricted, and Python does not have tail
optimisation, so any code that uses recursion can quickly fail.

Enhanced generators (available since Python 2.5) provide a solution, but
one that has some limitations.

In short - generators can be used to implement trampolining.  The program
stack then becomes an array on the heap and is "unlimited" in size.

The simplest way to implement this is to rewrite a program by:

  1 - replace "return x" with "yield (True, x)"

  2 - replace "f(x)" with "yield (False, f, x)"

The program can then be run with code something like (untested):

  def run(f, args):
    stack = [f]
    while stack:
      result = stack[-1].send(args)
      if result[0]:
        stack.pop()
        args = result[1:]
      else:
        stack.push(result[1])
        args = result[2:]
    return args

That's not quite correct as it assumes that everything (ie all the "f"s
above) is already a generator.  And it doesn't handle exceptions.

There's more here - http://www.python.org/dev/peps/pep-0342/

And I'm still unclear on what happens when a function yields (in the
original form).  But I think this is the right track...

The main restriction is that *every* function needs to be aware of the new
protocol (unless they simply return values returned from sub-calls, I
think, since these will now be generators rather than "real" values).

I assume this is all "obvious" if you understand what coroutines are, but
the details seem pretty complex to me.

Andrew

Handling Yield

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 14:56:17 -0300 (CLST)

I've been thinking about this some more over a (late) lunch.

Since a generator is returned "magically" you cannot require a return to
be formatted in a special way.

Another way of saying the same thing:

  def mygen():
    ...
    yield x

is always going to return a generator.  It cannot yield (False,
generator).  So we need a special marker to indicate something must be
evaluated.  For now I'll use The ToRun() constructor.

But then think how this would be used:

  for x in mygen():
    ...

would work, but doesn't trampoline.  If we instead had:

  for x in yield(ToRun(mygen)):
    ...

the yield would return to the top level, which would recognise the ToRun,
somehow extract the embedded mygen and call it.  That returns a generator
which is then sent back to the original caller.  So far, everything works
nicely.

The trouble is: what happens if the generator created by mygen wants to
call something else?

  def mygen():
    x = yield ToRun(foo, args)
    yield x

Unfortunately the yield here is going to appear as x in the loop above,
instead of going to the trampoline.

This might be fixable using a global function that all callers to
iterators must use.  But the same effect can be had more transparently by
adding the wrapper within the trampoline.

I suspect I will have to write some code to see if it actually works.  But
for now back to work...

Andrew

More on Co-Routines

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 18:31:57 -0300 (CLST)

Nope, that's not right.  I need to first define how I start
things running.

So, assume the trampoline starts by invoking some function.
That will either return a value, in which case it wasn't
expecting to be started by the trampoline, or a generator.

If it returns a generator then either it wasn't expecting to
be called by the trampoline system and that's an actual
result, or the generator is something that should be managed
by the trampoline.

Assuming it should be managed by the trampoline, the
trampoline calls next().

If it receives a ToRun() value then it pushes the generator
on the stack and runs the "subroutine".

  If the subroutine returns a non-generator value then it
  wasn't expecting to be called by the trampoline, but no
  worries, we can send back the value anyway.

  If the subroutine returns a generator then we need to
  assume that it was expecting to be invoked by the
  trampoline.  We take next from the generator and look at
  the value.  If it's a ToRun then we can repeat as before.

  If it's not a ToRun then we must assume that it contains a
  value that is being "returned" by being yielded.  We have
  to assume this because a generator can't simply return.
  So if sometihng is going to call via the trampoline (it
  contains a yield ToRun) then it can only yield, not
  return.

  (Although another option would be to raise some special
  kind of exception to return values, I guess).

  Anywya, that assumption is OK - the generator is discarded
  and the value sent back to the previous caller from the
  stack.

  The problem is: how to return a generator?  I gues it just
  yields a generator.  That may not be a big issue - I'm
  just worried that

Next problem, then.  We can handle things that don't expect
to be called by the trampoline.  But other things cannot
handle things that expect to have been called by the
trampoline, because they will receive a generator instead of
a value.

So there has to be a strict division between functions that
are called normally and those that are trampolined.

How about if we say that matchers are the only things that
should be trampolined?  But that means they must yield
generators.  Because if they yield directly the trampoline
has broken the generator before it knows it's not a ToRun.

Unless the trampoline itself wraps generators.  Then it
could return that value.

Would that work?  We can wrap generators automatically.
This won't work for "normal" functions (if they return
normally they can't yield, so can't call out via the
trampoline).  But that's OK because they'll just work
normally.

So what I'm saying is that only generators can trampoline.
This isn't necessarily a strong constraint, I'm just
choosing it to simplify things.  Well, no it is a strong
constraint, because we're trampolining between coroutines.
I guess what I mean is that I am not going to look at
automatic handling on non-generators and I am going to avoid
issues with bootstrapping from the initial non-generator
call by isolating the generators themselves.

This has the side effect of starting the trampolinging at
the first found generator, rather than at the initial call.
I don't think that's an issue.

OK, so most of the above is incorrect (or at least
incomplete).  We only trampoline once generators are "up and
running" within normal code.  We then intercept the value
returned from a generator via a wrapper and detect flagged
call values.  These are handled by the normal trampoline
process.

That might sound like we're back in the same description as
above, but we no longer assume that the return value from
the called function is trampoline aware.  Instead, we call
it normally and get some function back.  Ah, crap - that
doesn't work because we're back where we started.  If we
simply send the returned value then we've not intercepted
any later calls.

OK, so backstep and try again.  The trampoline is at the top
level and is asked to evaluate a matcher.  The matcher is
guaranteed to return a generator.  The tampoline invokes the
matcher and receives the generator.  It calls next on the
generator and, if it's a normal value returns it as the
matcher's first result.  Otherwise, if it's tagged, it's a
call to a new matcher.  That matcher is invoked similarly
(from the top level).

That works.  What has changed?

There's a weird asymmetry required.  It only works if you
start with a function that returns a generator.  It's not
the generator that's wrapped, it's the entire matcher.

After-work beer is kicking in.  Time to let that soak for a
while.

Andrew

Clearer

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 20:21:27 -0300 (CLST)

So, the "asymmetry" I mentioned above is basically because you have have
the trampoline above the coroutines (this is so obvious it may not be
clear what I mean).  Otherwise, you're not "going back" on the stack and
the trampolining is pointless.

Anyway, the implementation detail boils down to how to feed results back
to the caller so they appear from an iterator.

I will write some example code and post again...

Andrew

Trampolining Code

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 21:38:14 -0300 (CLST)

So this is, I think, now pretty specific to LEPL, but it clearly works.

Typical output was

  1336,1335,1334,1333,1332,1331,1330,1329,13...
  2060,2059,2058,2057,2056,2055,2054,2053,20...
  221,220,219,218,217,216,215,214,213,212,21...
  742,741,740,739,738,737,736,735,734,733,73...
  308,307,306,305,304,303,302,301,300,299,29...
  1559,1558,1557,1556,1555,1554,1553,1552,15...
  678,677,676,675,674,673,672,671,670,669,66...
  4039,4038,4037,4036,4035,4034,4033,4032,40...
  107,106,105,104,103,102,101,100,99,98,97,9...
  418,417,416,415,414,413,412,411,410,409,40...
  921,920,919,918,917,916,915,914,913,912,91...
  2683,2682,2681,2680,2679,2678,2677,2676,26...
  947,946,945,944,943,942,941,940,939,938,93...
  2089,2088,2087,2086,2085,2084,2083,2082,20...
  304,303,302,301,300,299,298,297,296,295,29...
  3399,3398,3397,3396,3395,3394,3393,3392,33...
  421,420,419,418,417,416,415,414,413,412,41...
  3956,3955,3954,3953,3952,3951,3950,3949,39...
  1949,1948,1947,1946,1945,1944,1943,1942,19...
  3488,3487,3486,3485,3484,3483,3482,3481,34...
  0  100  200  300  400  500  600  700  800  900  Traceback ...

So the apparent stack depth of 1000 was easily exceeded.

  from random import uniform

  class Next(object):
    # This is the 'tag' class used to indicate that a value being
    # yielded is actually a coroutine that should be evaluated.
    def __init__(self, next):
      self.next = next


  def generator(level=0, p=0.001):
    # This returns it's own level with probability p.
    # Otherwise, it constructs a child at a lower level and calls
    # the child (twice, just to make sure children are persistent)
    while True:
      if uniform(0, 1) < p:
	yield str(level)
      else:
	child = generator(level+1, p)
	yield (yield Next(child)) + ',' + str(level)
	yield (yield Next(child)) + ',' + str(level)


  def trampoline(main):
    # A yield is either a 'real' yield to the parent coroutine,
    # or a new coroutine to generate.  The stack only tracks
    # the pending chain of coroutines - many others may exist,
    # but are managed via references internal to other coroutines.
    stack = [main]
    while stack:
      try:
	value = next(stack[-1])
	while not isinstance(value, Next) and len(stack) > 1:
	  stack.pop()
	  value = stack[-1].send(value)
	if isinstance(value, Next):
	  stack.append(value.next)
	else:
	  yield value
      except StopIteration:
	stack.pop()


  if __name__ == '__main__':
    t = trampoline(generator())
    for i in range(20):
      print(next(t))

    def stack_size(depth=0):
      if 0 == depth % 100:
	print(depth, ' ', end='')
      stack_size(depth+1)
    stack_size()


I'd like the main loop (in trampoline) to be neater.

Note in particular how object creation is now in-line.

Andrew

Correction on Python Stack

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 21:45:26 -0300 (CLST)

The size of the stack, since 2.6, can be set via the _thread lib. 
Unfortunately it's still all rather platform-specific.

Andrew

Simplified Code

From: "andrew cooke" <andrew@...>

Date: Wed, 11 Feb 2009 23:19:16 -0300 (CLST)

Since we can only work with "dedicated" classes, we can exploit the fact
that we know the types to avoid tagging.


  from random import uniform
  from types import GeneratorType

  def node(level=0, p=0.0001):
    if uniform(0, 1) < p:
      yield str(level)
    else:
      child1 = node(level+1, p)
      child2 = node(level+1, p)
      yield (yield child1) + ',' + str(level)
      yield (yield child2) + ',' + str(level)
      yield (yield child1) + ',' + str(level)

  def trampoline(main):
    try:
      stack = [main]
      value = next(main)
      while True:
	if type(value) is GeneratorType:
	  stack.append(value)
	  value = next(stack[-1])
	else:
	  stack.pop()
	  if stack:
	    value = stack[-1].send(value)
	  else:
	    yield value
	    stack = [main]
	    value = next(main)
    except StopIteration:
      pass

  if __name__ == '__main__':
    t = trampoline(node())
    for i in t:
      print(i)

Finally, Clean Main Loop

From: "andrew cooke" <andrew@...>

Date: Thu, 12 Feb 2009 00:14:32 -0300 (CLST)

def trampoline(main):
    try:
        stack = []
        value = main
        while True:
            if type(value) is GeneratorType:
                stack.append(value)
                value = next(stack[-1])
            else:
                stack.pop()
                if stack:
                    value = stack[-1].send(value)
                else:
                    yield value
                    value = main
    except StopIteration:
        pass

LEPL Roadplan

From: "andrew cooke" <andrew@...>

Date: Thu, 12 Feb 2009 08:53:57 -0300 (CLST)

I suspect trampolining is less efficient.  At the same time, it would make
trace easier to implement.

Because yield is a statement, not a function, it is difficult (impossible)
to have code that works either way.  But I now have the ability to rewrite
the matcher tree, so can substitute implementations.

The question is: what to do for next release?

I think the answer is to leave optimisations aside for now.  Get things
working one way (with trampolining and simplified trace).  Release that
then look at speed.

Andrew

Comment on this post