| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

Calling C From Fortran 95; Bjork DJ Set; Z3 Example With Python; Week 1; Useful Guide To Starting With IJulia; UK Election + Media; Review: Reinventing Organizations; Inline Assembly With Julia / LLVM; Against the definition of types; Dumb Crypto Paper; The Search For Quasi-Periodicity...; Is There An Alternative To Processing?; CARDIAC (CARDboard Illustrative Aid to Computation); The Bolivian Case Against Chile At The Hague; Clear, Cogent Economic Arguments For Immigration; A Program To Say If I Am Working; Decent Cards For Ill People; New Photo; Luksic And Barrick Gold; President Bachelet's Speech; Baltimore Primer; libxml2 Parsing Stream; configure.ac Recipe For Library Path; The Davalos Affair For Idiots; Not The Onion: Google Fireside Chat w Kissinger; Bicycle Wheels, Inertia, and Energy; Another Tax Fraud; Google's Borg; A Verion That Redirects To Local HTTP Server; Spanish Accents For Idiots; Aluminium Cans; Advice on Spray Painting; Female View of Online Chat From a Male; UX Reading List; S4 Subgroups - Geometric Interpretation; Fucking Email; The SQM Affair For Idiots; Using Kolmogorov Complexity; Oblique Strategies in bash; Curses Tools; Markov Chain Monte Carlo Without all the Bullshit; Email Para Matias Godoy Mercado; The Penta Affair For Idiots; Example Code To Create numpy Array in C; Good Article on Bias in Graphic Design (NYTimes); Do You Backup github?; Data Mining Books; SimpleDateFormat should be synchronized; British Words; Chinese Govt Intercepts External Web To DDOS github; Numbering Permutations; Teenage Engineering - Low Price Synths; GCHQ Can Do Whatever It Wants; Dublinesque; A Cryptographic SAT Solver; Security Challenges; Word Lists for Crosswords; 3D Printing and Speaker Design; Searchable Snowden Archive; XCode Backdoored; Derived Apps Have Malware (CIA); Rowhammer - Hacking Software Via Hardware (DRAM) Bugs; Immutable SQL Database (Kinda); Tor GPS Tracker; That PyCon Dongle Mess...; ASCII Fluid Dynamics; Brandalism; Table of Shifter, Cassette and Derailleur Compatability; Lenovo Demonstrates How Bad HTTPS Is; Telegraph Owned by HSBC; Smaptop - Sunrise (Music); Equation Group (NSA); UK Torture in NI; And - A Natural Extension To Regexps; This Is The Future Of Religion; The Shazam (Music Matching) Algorithm; Tributes To Lesbian Community From AIDS Survivors; Nice Rust Summary; List of Good Fiction Books; Constructing JSON From Postgres (Part 2); Constructing JSON From Postgres (Part 1); Postgres in Docker; Why Poor Places Are More Diverse; Smart Writing on Graceland; Satire in France; Free Speech in France; MTB Cornering - Where Should We Point Our Thrusters?; Secure Secure Shell; Java Generics over Primitives; 2014 (Charlie Brooker); How I am 7; Neural Nets Applied to Go; Programming, Business, Social Contracts; Distributed Systems for Fun and Profit; XML and Scheme; Internet Radio Stations (Curated List); Solid Data About Placebos; Half of Americans Think Climate Change Is a Sign of the Apocalypse; Saturday Surf Sessions With Juvenile Delinquents; Ssh, tty, stdout and stderr; Feathers falling in a vacuum; Santiago 30m Bike Route

© 2006-2015 Andrew Cooke (site) / post authors (content).

40-fold Speedup in LEPL Parsing

From: "andrew cooke" <andrew@...>

Date: Fri, 6 Mar 2009 19:51:54 -0300 (CLST)

OK, so this was a somewhat extreme test case, but by rearranging the
grammar I can make it more likely to terminate and so avoid lots of
pointless recursion.  Here's the code that does the rewriting - I'm
pleased at how simple it is, after all the work making matchers graphs,
etc:

def optimize_or(graph):
  '''
  When a left-recursive rule is used, it is much more efficient
  if it appears last in an `Or` statement, since that forces the
  alternates (which correspond to the terminating case in a
  recursive function) to be tested before the LMemo limit is
  reached.

  This rewriting may change the order in which different results
  for an ambiguous grammar are returned.
  '''
  for delayed in [x for x in preorder(graph)
                  if type(x) is Delayed]:
    for loop in loops(delayed):
      for i in range(len(loop)):
        if isinstance(loop[i], Or):
          # we cannot be at the end of the list here,
          # since that is a Delayed instance
          matchers = loop[i].matchers
          target = loop[i+1]
          # move target to end of list
          index = matchers.index(target)
          del matchers[index]
          matchers.append(target)
  return graph

Even on a more reasonable case (the LMemo example in the docs), the
speedup is about a factor of 2 (that includes some additional tweaks
elsewhere).

It's surprising how uniform the profiles are (the flip side of that is
that there's little to optimise from a code-tweaking pov).  The only other
improvement I can think of at the moment is adding a tokenizer (which will
allow LMemo lookahead to use larger "chunks", advancing by words rather
than letters).

Andrew

Comment on this post