Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

Processing Large Volumes of Data in Lepl

From: andrew cooke <andrew@...>

Date: Tue, 1 Mar 2011 21:35:03 -0300

Lepl was designed to have the ability to parse large (as in, larger than the
amount of available memory) volumes of data.  However, I shamefully admit that
I never tested this (until now, I assumed no-one would care, and I hadn't had
a need for it).

But someone on the mailing list said that it was important to them so, as part
of the Lepl 5 work, I wrote some code to check that large inputs were handled
correctly.

It turns out, of course, that they weren't.  And so I have been spending way
too much time trying to debug memory use.  But the good news is that, with the
very latest code, and some care, it is possible to process arbitrary amounts
of input.

To show how this works I will work through the test code I am running (which
is processing several GB of data, but using only 0.5% of my laptop's memory to
do so!).


First, for my test, I needed a problem that was easy to describe, and which
"used", in a sense, all the input data, but didn't have a huge result (even
Lepl can't make a large result take no room...).  So I decided to use data in
which each "line" was a string containing a number, incrementing from 1 ("1",
"2", ...).  The result would be the sum of the first digit of each of the
numbers.

To avoid keeping all the input data in memory at once we must use an iterator
as the data source.  In normal use this would probably be a file (the result
of "open(...)" can be passed to Lepl's parse() methods), but for the test the
source looks like:

  def source(n_max):
      return imap(str, takewhile(lambda n: n <= n_max, count(1)))

(Note that this is Python 2 code, so I am using imap - in Python 3 that would
be a normal map).

I also need a matcher that returns the first digit:

  @sequence_matcher
  def Digits(support, stream):
      (number, next_stream) = s_line(stream, False)
      for digit in number:
	  yield ([int(digit)], next_stream)


At this point I hit my first problem: how to add numbers?  Normally I would
use "sum()" as a transformation of the results.  But that implies that the
results must first be generated (as a list).  For a large input we cannot do
this (the list could get too large).

After thinking about this I decided to extend the way that repetition works in
Lepl with a "reduce" parameter.  This can be specified with Override:

  with Override(reduce=...):
      matcher = foo[:]

or as an argument for Repeat (see below).  The value of reduce is the tuple
(zero, join) where "zero" is the initial value and join acts as an iterative
fold.  So for no repeats, the result is "zero"; for one repeat it is the
result of "join(zero, result1)", for two "join(join(zero, result1), result2)".

For my example I used:

  sum = ([0], lambda a, b: [a[0] + b[0]])
  Repeat(Digits(), reduce=sum, ...)


At this point I thought things should work.  Of course, I still had to add the
appropriate magic to the configuration:

  p.config.add_monitor(GeneratorManager(10))

which would restrict the number of "pending" generators to just 10.  This is
the functionality that has always been in Lepl and that was intended for
cases like this.  It is actually the source of a fair amount of complexity in
the implementation (to keep track of the generators and to discard the "least
used" at any point).

Unfortunately, as I have already said, this didn't work.  Memory use exploded
and my laptop crashed.


It is probably worth going into a little detail about how to debug memory in
Python.  The word "debug" is a little misleading, because there is no bug in
Python - just in my code.

But before I do that I guess I also need to explain a little more about how I
handle the input.  The iterator (a file or, in this test, the number source)
is wrapped in some classes that create a linked list of lines: each line knows
about the line that follows (if necessary, calling the iterator to return the
next value).  The happy consequence of this is that when the first line has
been parsed, and no object has a reference to it, it can be garbage collected
by Python (the list links to lower lines, but not back up to the previous
lines).

So "all" that is needed for us to handle a large amount of input is for us to
use this linked-list "trick", and to make sure that there are no references to
lines once they have been parsed.

Debugging the problem, then, means finding out what objects exist, and who has
a reference to them.  There are two tools that help.

First, there's a package called Guppy, with a module called hpy, which will
show heap usage.  You use it like this:

  from guppy import hpy
  h = hpy()
  hp = h.heap()
  print(hp)

and typical output might be:

Partition of a set of 335881 objects. Total size = 48804432 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  10008   3 10488384  21  10488384  21 dict of lepl.core.manager.GeneratorRef
     1  10000   3 10480000  21  20968384  43 dict of 0x846620
     2  61337  18  4753648  10  25722032  53 tuple
     3  10505   3  3897560   8  29619592  61 dict (no owner)
     4  40714  12  3257120   7  32876712  67 types.MethodType
     5  29645   9  2265384   5  35142096  72 str
     6  20020   6  1923152   4  37065248  76 unicode
     7  20596   6  1812448   4  38877696  80 __builtin__.weakref
     8  10897   3  1466120   3  40343816  83 list
     9  59638  18  1431312   3  41775128  86 int
<182 more rows. Type e.g. '_.more' to view.>

You can then play around with the results.  But what was most important was
just to get some idea of what types are using up memory.  Once you have that,
you can go hunting for them with the second tool: Python's gc module.

For example, this code will get all the GeneratorRef instances:

  o = gc.get_objects()
  gr = [oo for oo in o if isinstance(oo, GeneratorRef)]

and then you can find out who has a reference to one of those with
gc.get_referrers():

  gr_owners = gc.get_referrers(gr[0])

In this way uou can study the objects in memory and work out why they are
still sticking around.


It's unfortunate, but Lepl's backtracking makes it *really* sticky when it
comes to holding on to objects.  The problem is that you never know when you
might need to go all the way back to the start.

While GeneratorManager reduces stickiness in the "stack", it turns out that
Lepl had two other places where references to the input are retained.


The first was on the "topmost" matchers.  My parser was very simple:

  parser = Digits() & Eos()

but the topmost And() causes a problem: when it is first called, it is passed
the first line, and it keeps that reference until the parser completes.  The
lower level Digits() doesn't have the same problem: it is called for each line
in turn; for each line it does the work and then cleans up.  But And() cannot
clean up because it never finishes (well, not until all the input has been
consumed).

And it's not just And().  The top level Repeat() has the same problem.

Because this problem is for a very special case - the topmost matchers when
parsing huge amounts of data - I don't think a general solution is appropriate
(and I can't find one that doesn't affect Lepl deeply in other ways).  So
instead, I have a simple, ugly solution, that does the job.  The parser
becomes:

  NO_STATE(And(NO_STATE(Repeat(Digits(), reduce=sum)), Eos()))

Those two NO_STATE matchers strip the stream reference and remove this
problem.  Brutal but effective.


The second source of stickiness was the "secondary" stacks that are used for
backtracking during search.  In a sense this is the same issue as above: the
topmost Repeat() maintains a reference to the earlier input in an internal
stack that is used on failure.

Again, the solution is fairly brutal: I force the stacks to be limited to a
certain size.  In the future it might be nice to integrate this with
GeneratorManager, but for now it is specified directly:

  NO_STATE(And(NO_STATE(Repeat(Digits(), reduce=sum, clip=10)), Eos()))

Again, note that although this is ugly it is only needed at the very top level
of the parser - the part that exists for all lines.  Anything that is called
for a single line is not affected.


With those two fixes (and some related internal re-arrangements), memory does
not leak for this test.  There may be some other issues with other matchers,
but I think that the general problems are solved.

So some modification of the "topmost" parsers is needed, and the work takes
time and an understanding of both Python and Lepl, but it is now possible to
parse large inputs.

This work will be released as Lepl 5 (probably before April - the main task
now is updating the documentation).

Andrew


PS Lepl 5 also has improved stream handling which cuts parsing time by
15-35% - https://groups.google.com/d/msg/lepl/lGwLEs3MFFg/Yz4c6vAz90YJ

Evidence :o)

From: andrew cooke <andrew@...>

Date: Tue, 1 Mar 2011 21:48:51 -0300

Here's the output from a test that just finished running:

  >>> print(pp.matcher.tree())
  NO_STATE
   `- matcher UntransformableTrampolineWrapper<And>
       +- NO_STATE
       |   `- matcher UntransformableTrampolineWrapper<DepthFirst>
       |       +- clip 10
       |       +- stop None
       |       +- reduce ([0], <function <lambda> at 0xa8f758>)
       |       +- rest SequenceWrapper<Digits:<>>
       |       +- start 0
       |       `- first SequenceWrapper<Digits:<>>
       `- FunctionWrapper<Eof:<>>
  >>> r = pp(source(10**7)) # keep generators open by not expanding
  >>> print(next(r))
  [49999996]
  >>> h = hpy()
  >>> hp = h.heap()
  >>> hp
  Partition of a set of 45266 objects. Total size = 6254072 bytes.
   Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
       0  18479  41  1654624  26   1654624  26 str
       1  11357  25   914840  15   2569464  41 tuple
       2    149   0   437240   7   3006704  48 dict of module
       3   3013   7   361560   6   3368264  54 types.CodeType
       4   2922   6   350640   6   3718904  59 function
       5    509   1   319928   5   4038832  65 dict (no owner)
       6    332   1   303008   5   4341840  69 dict of type
       7    332   1   297448   5   4639288  74 type
       8    154   0   161392   3   4800680  77 dict of lepl.core.config.ConfigBuilder
       9    141   0   150840   2   4951520  79 dict of class
  <180 more rows. Type e.g. '_.more' to view.>

The input stream went from "1" to "10000000" (10e7).  With unicode that's
about 2 * 10e7 * 7 bytes (about 100MB).  The total memory used (note that the
snapshot above still has pending results from backtracking, so the parser
still "exists" in memory) is 6MB.

So this is equivalent to parsing a 100MB file using just 6MB of memory.

(Another simple way to see that input is not in memory is to note that there
are not 10 million instances of any one object!)

Andrew

Fix 1 - No need for NO_STATE

From: andrew cooke <andrew@...>

Date: Wed, 2 Mar 2011 07:45:30 -0300

I've dreamt up a fix that avoids the need for NO_STATE.  The reference to
stream that was being kept is used only for debugging so a good compromise (I
hope) is to use a weak reference.

In fact, a simple weak reference won't do, because a stream is a tuple in Lepl
5, so it's actually a wek reference to an anonymous function that returns the
stream.  Despite this being ubiquitous code (on the "inner loop" in a broad
sense) that doesn't affect performance (at least, not much - it's more than
offset by another fix that I made, which I didn't have space to explain above,
that makes (a lack of) transformations more efficient).

Andrew

Fix 2 - No need for explicit clip

From: andrew cooke <andrew@...>

Date: Wed, 2 Mar 2011 08:57:36 -0300

And, as suggested above, I can tie GeneratorMatcher together with the search
so that clip is set automatically.  The parser now looks like:

    sum = ([0], lambda a, b: [a[0] + b[0]])
    with Override(reduce=sum):
        total = Digits()[:] & Eos()
    total.config.add_monitor(GeneratorManager(10))

Well, almost - I am using a bunch more config options and need to check what
is really necessary...

Andrew

Final Code

From: andrew cooke <andrew@...>

Date: Wed, 2 Mar 2011 21:21:22 -0300

This shows the final code and configuration needed:

https://code.google.com/p/lepl/source/browse/src/lepl/_performance/memory.py

Andrew

Comment on this post

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2011 Andrew Cooke (site) / post authors (content).

Last 1000 entries: Re: Finding Matches in Graphical Hashes; Finding Matches in Graphical Hashes; C Interfaces and Implementations (A Review); It's 2012 Why Does My IDE Suck?; blockaid.me - Free DNS; Magicd - Haskell based consistency-on-read for Riak; BBC Article; Long Working Hours in Chile; Martin Warp - Free Comic Book; Useful Thread on Clojure Data Vis; More on Gravity, Information; Clojure's Map Implementation; Microcredit and Microsavings; Compound of Five Tetrahedra; Example of Writing Indexed PNG Using PNGJ + Clojure; Achewood is Back; How Do You Get From Here To There?; How to Stop Spam from Quora?; Index on Censorship Anniversary Issue; Re: Taking Back Email (not); Re: Taking Back Email (not); Taking Back Email (not); Also, Numba; Gradle - Somewhere Between Ant and Maven; Two More Frameworks; Javascript Server/Client Frameworks; Clojurescript in Intellij Idea; Cute Piece of Clojure Code; Re: Cute Piece of Clojure Code; Cute Piece of Clojure Code; numexpr - Fast Evaluation in Python; What do People Want on StackOverflow?; Constraint / Optimization Resources; Finite Fields and Error Detection; Event Driven Programming in Clojure; Choco Update (and New Blog); Rocking Chair (Self-supporting lattice; Chilean); Using Choco; Cure for Cold (and Aids, and...); N-1 Dimensional Planes; Uniform Fences Don't Give Uniform Data; Gas Leaks and Smells; Re: Questions on OpenCL; MultiMap and MultiSet; Switching from Triangles to Indexed Vertices in WebGL; Maths for Computer Scientists; Re: StackOverflow and "Show your work"; Re: StackOverflow and "Show your work"; Fixing Chile Time on Linux (Daylight Saving); StackOverflow and "Show your work"; C Compound Literals; Small Corrections; Time-switching Procmailrc Recipe; Learning Modern 3D Graphics Programming; Python Borg (Monostate, Singletonish); Debugging webgl; Fast, Secure Google Search Replacement; Setjmp and SQL; The Best Feeling as a Programmer; Javascript Documentation; Another test; And Again; Testing Again (2); Testing Again; Final Javascript Example (including browser); Testing External Comment; Building for the Web; Julia Programming Language; Better Test Example; Javascript Development on the Command Line; Visualizing Process Execution (Performance Tuning); Ichiban - Japanese Restaurant in Santiago; Getting started with ClojureScript and Noir #1; Wrapping an Iterator in an Iterable; More D3 - Joins; D3 - Javascript Data Viz; Parasites Affect Our Behaviour; Info on Anon P2P etc; Hardware Transactional Memory coming to Intel; Rome - 3 Dreams of Black; New Technologies; Reinstalling OSX via Linux; Shoddy Macs; Similar Profiler for C; Profiling Go Programs; Access to Bloomberg Data Feed; Automated JVM Leak Detection; Fashion (Men's); Left Drifts Centre; Right Becomes Vicious; Comparison of TeX Processors; Jolly Good Idea Chaps, Wot?; I Can Vote!; DHT, PEX and Magnet Files; ConTeXt - A Latex Replacement; for comparison; Reducing Energy Use; Scavenging 2.5" Disks; Running Caldera's Hadoop Demo in VirtualBox; Telecomix (Net Activism); Face Detection Algorithm; NFB 3.1 - Broken and Fixed; Category Theory for Java Programmers; piUML - A Language for UML; Distributed Hash Tables; Updating Mediawiki Database; The Coming War on General Computation; Hobby Shop in Apumanque; Overpaid Mediocrities Running Banks; Ghostbranders; Seeing red; evolution of color vision; Evolution of Colour; Accomodation in SF; An Idea a Day; In AGU's Defense (OpenSuse Kernel Bug); AGU 2011; Chile tops OECD Inequality; Glances - Curses System Monitor; Python libraries error on OpenSuse 12; Parrondo's Paradox; Notes on Installing OpenSuse; Quantum Mechanics - Explanation of PBR Result; And Politics?; Ideas for Projects; Return to Olivie; OpenSuse 12.1 Out; re: Command Line Sequencer; subjective maps; Brain's Division of Labour in Routefinding; StarTechConf; Audio-GD NFB 3.1 DAC; Deciphering DHL Status Reports; Imports to Chile with DHL (Hidden Charges + Tax); MVCC, Snapshot Isolation, Write Skew; Vote Against Alcala del Rio; Chilean Wine; The Most Awesome Music Video Ever; Re: too fast already; Second Guessing; too fast already; LMAX Architecture; Over 760 RSA Attack Victims; El Ancla - Seafood Restaurant in Santiago; Command Line Sequencer; Random Albums; Eurotel, Guardia Vieja; Nogales for the Win!; Guava - Useful Java Utilities; Average Angle; My Brain Makes Things Taste Funny; From the Haskell Perspective; Hotel Orly; Guria - Spanish Restaurant in Santiago; Mid-Price Hotels in Providencia; Richter Continues to be Awesome; C Sequence Points; Yeah I am getting to this feeling too; Swarm of Micro-Satellites (few dollars each); How to Lead Clever People; Software Foundations; Chilean Aircraft Crash from Lack of Fuel?; Quantum Graph Isomorphism; Radix Sorting; Efficient Entropy Estimates for Sequences of Large Values with OpenCL; Overtone: Clojure Music Synthesis (Synth, Sequencer, Higher Logic); Pisco tasting; Computer virus hits US Predator and Reaper drone fleet; Cow Clicking; Re: Banco popular; Banco popular; Access to Banco de Chile from Argentina; Trickier Alignment; Bytes in struct; Struct and packing; Using Array; Copying Bytes; Some simple pyopencl examples; Complexity, statistics; The First Law of Complexodynamics; Installing numpy and scipy in Python3.2 virtualenv; Is HN Being Overrun by Downvoting, Groupthinking Lemmings?; GUI Architectures (Fowler) - MVC etc.; Good Explanation of TLS 1.0 (CBC) Attack; Today in the USA...; Shorter variant, using Bash redirections...; Possibly Useful List of Books to Read; The Other September 11; Modern GPU - Articles on GPU Programming; Long Lambda Post on Multiple Topics; The Three Christs; 3QD Philosophy Prize Semi-Finalists; Qubes - Security by Isolation in VMs (Xen); DSLs in Python; Tottenham Riots; Baz Ratner - Howitzer Image; Getting Started with Pypy on OpenSuse; Gnuplot Tricks; Clojure Wrapper; More on (vector-of :double); Optimising Clojure; Using Constraint Programming to Identify Groups; lp_solve; Non-Comemrcial; Mixed Integer Programming in Python; Vegetative Patients Wakened by Sleeping Pill; Data-Driven Documents (D3) - svg library; Gravity is not Statistical; Free Computer Science Book Downloads (Drafts); Why Clojure doesn't need invokedynamic; The RSA Email; You Dumb Liberal Fuck; Lambdas (SAMbdas - Single Abstract Methods) in Java; Saving Stack Space with Generators; Compressed Sensing, Matching Pursuit, Radio Astronomy; Re: Have you tried Babel-17?; Example Clojure Code; Use vectors where they make sense; Have you tried Babel-17?; More from HN and Adrian Sinclair; Re: Why I tried clojure and then stopped; Why I tried clojure and then stopped; Initial Thoughts on Clojure; Correlated Random Variables; Re: Regarding Firefox Using Company Proxy Settings; Regarding Firefox Using Company Proxy Settings; The Site in Question; Small Correction and Script; Automating Access to AppEngine with Federated Logins; ASCII...; Unix tree Command; O(n) and O(n^2) in a Dynamic Programming Problem; Resuming scp With rsync Across an Unreliable Link; Improved Sharded Counter for Google AppEngine; FFT in scipy etc; Deadman timer for Google AppEngine; Lessons Learned from AppEngine's Data Store; Fixing KDE on OpenSuse; A Better, Fluid, CSS Grid; Tumbleweed Back Working Now; Madera y Carbon - Colombian Restaurant in Santiago; Broken OpenSuse Tumbleweed; Re: Avoiding For Loops; Avoiding For Loops; Uniformly Random, Correlated Numbers in Matlab/Octave; Designing no-SQL Schema; Using libxml (libxml2) with Namespaces; Salaam Bombay - Indian Restaurant in Santiago; Bit Depth?; Optimising PNG Generation in Python; libxml2: Creating XML and Validating with Schema; Bananina Split; No Go; SSA v CSP (Structuring Intermediate Languages); Playing with Go's Interfaces (images); Testing Go in Intellij Idea; Re: Goroutines; IntelliJ Doesn't Automate Licence Processing; Re: Go Rocks - How Can We Avoid Something This Bad In The Future?; Some Links, Clarifications and Corrections; Re: Micro Languages; Go Rocks - How Can We Avoid Something This Bad In The Future?; The Bug Count Also Rises; Article on Greece, Euro, etc; Quanterra Q330 Calibration - Control Conventions; Testing Python in PyCharm; Violet (Interactive Fiction); Yet More...; More on Lepl + RXPY; What is TCP hole punching?; Stability Issues; Listing Colours For Dark Backgrounds; ARM + AMD Sitting Up A Tree; Coding Guidelines for C; Linux USB Wifi With TP-Link TL-WN722N; Politics Behind Fukushima Mess; Block Network for a OpenSuse User; Or Below...; Secure ID Hack Confirmed; And Beyond...; Rain; PortalDisc.cl and Odisea Odiseo; Too Complicated!; ASCII Display of Trees; Designing Incentives for Crowdsourcing Workers; Next Step for RXPY/Lepl integration; RSA Attackers Got (and Used) SecureID Data; Intercepting Skype using Phonemes (without Decryption); enum from 1; Configuring PyCharm to use Per-Project Config Files; X11 Bitmapped Fonts in Java JDK 7?; SSH Connection using libexpect in C; Fred Goodwin (Mr Zam) now Suing own Family; How To Write Papers with Restructured Text; Pytyp - Extending Python Types for Declarative Code; Le Bistrot - a Santiago Restaurant; And Away...; Variable Names; More Readable "Types in Python"; Excellent Article on Human Aftermath in Japan; Dynamic Dispatch in Python; First Guess; The Justice of Assassination; Re: Pro Django Review; Pro Django Review; Pro-Django Review; Computer Model of Schizophrenia; Rai and Olivie - two Santiago Restaurants; Maybe not Multimethods; Updated Python Types Draft; I ROCK!; Multimethods for Python; Elderly Couple's Suicide Agreement; Algebraic ABCs - A DSL for Types in Python; Guantanamo Visualisation; Information Physics: The New Frontier; Leaked Guantanamo Files - Often Little Justifcation; DWIM cd; And Up...; Terrifying Detail Available from (Future) Phone Tracking; Giving Callbacks Control over Exceptions; (Dumb) Algoritmic Pricing of (3rd Party) Amazon Books; Reverse (Remote) SSH Tunnel With Free Amazon EC2; Levels of Infinity; Elif Batuman: Life after a bestseller; Call for the Release of Ai WeiWei; Video on Chernobyl Arch; New Approach to Python Typing; Brillian Generative Music Automaton; Japanese Govt Lied About Radiation Levels Because...; Example Python Code; em-dash and en-dash in Emacs; Systematic Harassment of Software Engineer; Speculative Contacts (Stable Collision Physics); A Little More Detail; Insomniac Typed Programming in Python; Garrison Keillor savages Berrnard-Henri Levy; Perlin and Simplex Noise; Stronger Types for Python; Fixing Strange Import Behaviour in Python; Details on the RSA Attack; Active Flattening; Better Config Support for Python (and More!); Pioneer Anomaly Sovled!; Flattening Graphs; More Info on Last.fm Tags; Using Last.fm tags to play my mp3s on SqueezeCenter; Being There by Andy Clark (Free Philosophy of Mind Book); Calling SqueezeCenter CLI from Python 3; Why doesn't Python have better config support?; SQL and noSQL are Duals of Each Other; Good Article on Explosions; Earthquake Magnitudes and Physical Damage; Good Article on Reactor Risk; Hydrogen Source; Re: your excellent blog; Some Points Related to the Fukushima No. 1 Reactor; Installing MusicBrainz Database; Improving Squeezebox (Duet) Sound with V-Dac; Green Mathematics; Final Code; Fix 2 - No need for explicit clip; Fix 1 - No need for NO_STATE; Evidence :o); Processing Large Volumes of Data in Lepl; Radio/mp3 on Freedom, Privcay etc; Hyperpublic's Challenge; HTSQL - Compact SQL as Rest; Scala still sucks?; Free Mix Tapes; Musica Chilena (y Sudamericana); GoogleSharing; More Renderscript Info; Non-Google Search + Updated Site; GoogleSharing - Anon search while logged-in to Google; Curious US Military Cargo in Argentina; Android Renderscript (CPU/GPU code); Compiling (translating) PyPy 1.4.1 on OpenSuse; Final Code; To check fonts on KDE; Clean bitmapped fonts on OpenSuse 11.3; Further Update to Link; What is TCP hole punching?; Email above was dropped!; EMail and URL Validation in Python; LCD Test Images; Spindromes; Oooops; Analytical Marxism; Against Capitalism; This will not change in Egypt now; Back!; Tesla C1060 with OpenSuse 11.3; Watching Wal-Mart at Midnight; Also, From The Book; New in Functional Data Structures; Testing Django with Selenium; Protovis - Javascript SVG Library; Django OpenID: Invalid openid.mode: u'i'; Good Intro to LVM; A Chilean Day; A Python Logging Service; Serving YUI 3 files locally (and incrementally); Firefox uses Proxy with Selenium; Fressia too; Windows etc; Selenium Tests of Multiple Browser and OS Combinations; Resizing Cryptmount File System; Selenium Web Testing; Auto-Scaling Date Axes in Python; Setting File Permissions in Subversion; Easy Slide-in Menus using YUI 3; More Benchmarks; Generating SVG in Python 2.4; Future Work; RXPY Benchmarks; RXPY Update - Beam Engine; Forensics Using Frequency Variation of Mains Supply; UK Torture; More on CAP; Cloud Computing; GPU in the Cloud; How To Choose NoSQL; Empty Loops in Regular Expressions; Theano Experience; Compiling Python Numerics to GPU wuth Theano; Anybots - Physical Presence for Telecommuting; Fame! (Bonneville Power); Efficient List Slices in Python; Useful Jazz Lists; Is Deepwater Failing?; Fuck Yeah; Closures and Anon Functions in Java 7; Supercomputing Superpowers; Debugging A Hung (Spinning) Python Process; Interpreter for Python Regexps; The Nature and Future of Philosophy; Plus Memoisation; LEPL Optimisation with URL Validation; Erik Moeller - Defamation; Free Map-Reduce Book; Blocking MAC addresses with OpenSuse Firewall; Random Matrix Theory; Small Town Romance; Gravity from Information; Forcing Visual Processing into Boolean Logic; SXSW Economics; Museo Allende; SSL MIM Paper; Avoiding SSL Man In The Middle Attacks; OpenCL Examples; Re: A Practical Introduction to OpenCL; Battery Life; Visiting Rancagua; Visiting Santiago; Fully Homomorphic Encryption; Essays Questioning Market-Based Solutions; Not Monads!; A Practical Introduction to OpenCL; Triple Canopy (Magazine); RequestPolicy URL; RequestPolicy; Undead Links; Un-greyed Text; Hiding HN Karma; C Object System; Spam Filtering Details; Efficient Spam Filtering With Mutt and SpamAssassin; Lepl 4 Preview - Simpler, Faster, Easier; Prolog, LEPL, Phone Numbers; Mutt Working Well; Leaving GMail...; Quora Challenge; Good Haskell Example; Do not go gentle into that good night; OProfile - An Alternative for Profiling Java (and C); The Movies of Clint Eastwood; Automate my Ire; Proud to be (Almost) Chilean; Pan Fresco en Providencia, Santiago, Chile; Earthquake in Chile; Why More Equal Societies Almost Always Do Better; More Names + Books (Economics); Stommel Diagrams - Time v Space log log plots; Fermi Dying?; Windows Don't Minimize in KDE 4.3, OpenSuse 11.2; Compressed Sensing; The Complexity Era in Economics; Extra Notes on Repeating Install; Fossil - DVCS + Wiki + Bug tracking; Kingston SD Cards, Economics, Hardware Hacking; Here we go...; HLVM - High Level VM on LLVM via OCaml; Information Retrieval, Transmission + Quantum Computing; Corralillo Winemaker's Blend; Matetic Vineyards; South Butt's Reply; Metacompilers; Critterding, Polyworld (Evolutionary AI Sims); Visiting Bariloche (Balcones al Nahuel); UYKFD Description; Formal AI (Solve all Problems); Updated instructions; tomcat default servlet patten matching -- thank you!; Google Social Search; Books On Suburbia; Generating Syntax Errors from Examples; Thought Crime - The Heretical Two; Video of Pro-Pinera/Pinochet Protesters; Pinera, Chile, Economist; NNMF - An Alternative to SVD; Unladen Swallow Is Dead Duck?; Norvig on Non-Parametric Analysis (+ Other AI Videos); Retrospective on the Guantanamo "Suicides"; Developing OpenCL Code with an Intel x86 CPU; Redmine Project Management; Enable PCIE Too; Logitech MX Anywhere Mouse with Linux (Review); Relationship between EM and MP?; M3U to PLA (PLP?) Playlist Format Conversion; iRiver E30 MP3 Player (A Review); Models of Human Sociality; More Notes on GPGPU Programming; Traditional Telephony is Dead; Persisting Knowledge Across A Changing Workforce; And He's In This Too (Cynical - So Correct? - State Of World); Excellent Doctorow Column; Confirmed?; Detailed x86 Profiling; Unladen Swallow to Merge with Python 3?; Further Optimisation with OpenCL; Blocks Villa San Luis; How To Be Happy; Matlab/OpenCL Cross Reference; Calling OpenCL Directly; Pinera's Campaign Graphics Have Improved; Perceptual and Fuzzy Hashing; Encyclopedia of Symbols; Create You Own Programming Language; Can It Get Any Worse?; Logically Laid-Out Musical Keyboard; Chilean Presidential Elections; Lessons Learned (Not Mine!) with Crowdsourcing at the Guardian; Couple More Network Links; The Future of Telephony; Codenode - Python Take on Mathematica Notebook; More On OpenCL and Matlab Here; Experience Optimising Matlab Code with OpenCL (NVidia GPGPU); Or Simply Don't Use The Libs; Workflows; VisTrails; Good Local Santiago Tours; More Details on Java Extensions; Tribute to Jim Gray - Free Book on Data Processing Future; Voynich Manuscript Decoded?; Mogile FS; Correct Exponents; Trafigura Now Attacking BBC; Detailed Example of Climate Change Sceptic Debunking; Lemonade Recipe; XTRMNTR; Regular Expression Matching: the Virtual Machine Approach; BSGP: Bulk-Synchronous GPU Programming; Cassandra; Analytics - Jobs for the Future; NoSQL Papers; Extern C; Calling OpenCL from Octave / Matlab; Notes on Array Layout; My Day With The Mental Health Professionals; How To Write Good Cron Jobs; Dark Matter Found?!; Reflections on Playlist Generation (UYKFD); Lazy Parsing; Bad Memory; Intel Drops Larrabee; Python Code to Compile Regexps; Heart Monitor Watch + Hackable Hardware; Live Map of Shipping; Synergy Updated; Good Ideas for Dates; Radioactive Boy Scout; UK's "Terrorism" Laws Used Against Innocent Schizophrenic; Generating Uniform, Correlated Random Numbers; Etherial Electronic Art; Fool Me Once; Squeezebox Duet Not Connecting to Server; WTF - Closures in Java 7 After All?; American Airlines fires AA.com designer for reaching out to customer; Visualizing Empires Decline; Electronic Fratricide; Another Go v Python Comparison; Wrong Attribution; Google's Go Slower than Stackless; Significant Objects; Offensive US "Cyber" Operations; Scala Style Guide; NVidia's own Demos; Simpler, but "Micro"; MITM Attack Against SSL; SimHashing - Detecting Similar objects with Hashes; Wire Music Lists; (Not So) Random Walks on Graphs; What We Actually Know About Software Development; The UK did it first!; UYKFD Progress - Playlist Generation from LastFM Tags; Diagrams Through Ascii Art - Coolest Software this Millennium?; Scala for Generic Programers; Carl Jung's Red Book; Interesting Comment (+ Pointers) on Architecture; Frei Campaign Posters; Free Will, Determinism, Compatibilism; Exotic Chocolates in Santiago, Chile; Matlab on NVidia GPUs; Installing OpenCL on OpenSuse 11.1; Where Would a Do-Gooder Do the Most Good?; TXR - Pattern Matching / Template Language; The Sirens of Titan by Kurt Vonnegut, Jr; Follow-up in Guardian; Larrabee Dirt + Background; Guardian Censored over Trafigura Questions; Good Background on OpenCL etc from Anandtech; Using Java Collections in Scala 2.8 (and 2.7); Software Quality Mythology; NVidia Just Released OpenCL Support; And If You Still Don't Get It; Outer Join and Sub-Select Example for Empire DB and Scala; Calling REST Web Services from Java (the Java WS Ecosystem); Auto-Delegation in Scala using Implicit Conversion; Using Scala with Empire DB; Why Does Democracy Need Education?; Setuptools for Python 3 (is called Distribute); Switched to Emacs; TxtSushi - SQL for ASCII Files; Something That Shows How Google Wave Might Be Cool; BitBucket Outage Details - Cloud v DDOS; Congratulations Mule - Europe-Wide Win!; Single Line; Lagged Cafe - Kashiwa Mystery Cafe; DSLs (implemented with Haskell) Help Build Microsoft's new Multicore OS; Implement Phonetic Name Searches with Double Metaphone etc; BOUML - A UML Tool with Reverse Engineering; Fixing IntelliJ Idea 9 EAP on 64 bit Linux (Could not find agent library); Empire DB Example with Scala; Free Scala Book (Programming Scala); Attack on MD5 Based Authentication for Popular Sites; Text of AP "Writethru" on Polanski; Revised Instructions for Adding Dependencies; Interview; More on Scala; Scala in More Detail; Trying Again (New Instructions); Scala Bug Report; Starting a Scala Project; Testing Pollsters - 538 v Strategic Vision; Measuring Complexity; Books to Read (Best of Decade, Millennium); GRRF - The Last Lecture; Java / Scala Bindings to OpenCL; John Abercrombie Organ Trio, Santiago, 24 September 2009; As Rigid as Possible Shape Interpolation; The Poor (well, Over-Extended) Middle Class; Quantum Computer Factors 15; Diesel Asynchronous Network Apps in Python (uses Coroutines); Django Template Tips; Starting a Linux Computer Remotely (WOL / PME); Causality - Inferring Causal Networks; Algorithmic Game Theory (Free Book); Running "find" in Parallel; Network Protocol Description Language; PyOpenCL - Python Layer to OpenCL GPU Programming; Would You Work With These People?; New Johnston Sans Typeface (the Underground); Delayed due to State; How Stupid is eBay?; String Theory is Just a Technique for Summing Terms in QCD; One More Reference; Iranian Gold and Cash (nearly $20bn) in Turkey?; Noop (no-op) - New JVM Language from Google; More Offside Documentation; Rethinking The Firm; Renault Told Piquet's Son to Crash; Hardware Hacking - Pictures from Space; Replies Work Too?; Moving to WebFaction; La Nana (Chilean Film); What's so Neat...; Offside Parsing Works in LEPL; How a Construction Crane is Made (Builds Itself); More Al-Qaida Details; And the X1; Leica M9 (Full Frame); Dark Stalking on Facebook - Tracking Invisible Identities; Al-Qaida Faces Recruitment Crisis; NSA Intercepted Emails used in UK Liquid Bomb Trial; A Review; Extended Bash Shell (Including ASCII Plots); RSS Cloud - Putting the Push in RSS?; Mercury Prize Nominees; Raphael - Javascript Library for Graphics; Domain Specific Language Conference (Papers etc); Rhonda 3D Drawing Program (+ Video); PyDev 1.5.0 now All-Free; Page Rank Gives Critical Nodes - Extinctions; Designing Crypto is Hard (Schneier - Don't Use AC); Yike Electric, Foldable Bike (Exists?!); Faster with Overvoltage; Negative Interest Rates in Sweden; Overclocking Q9550 with Asus P5Q; H1N1 Virus DNA and DIY BioTerrorism; GF1 Preview; Panasonic GF1 - Grown up LX3; Tweeting from the Linux Command Line; Cheap, Simple, Massive Storage; Thanks for this; Coders at Work (Book); Netflix Culture; More Indentation; Representing Indentations for Parsing; More Quads; Hidden Cost of Coroutines?; Interview with Amartya Sen; Article on Coroutines, Python, State Machines; Amazon, Clouds, etc; Pylint and Python 2.6; P / NP Summary; Depression's Evolutionary Roots; Economist Review; Intel Quad Core Prices; Scotland needs no lessons in matters of fairness from a country that has been routinely waterboarding suspects in Guantanamo Bay; Free Book on MetaHeuristics; Scheme to split in two; Hopelessly Naive; Stalin Had Similar Ideas; Sean Smith; Life is Good; Afghanistan - Reportage / Photos in Guardian; Pictures for Sad Children - Airshow; Also, Lombok; Mixins For Java; Rules For Use; Automatic Banknote Detection; Using Computers to Help Scheme Against Paying for Bhopal; Distributed Teams Build More Modular Products; Schumacher > Anonymous Pro; Anonymous Pro - Better than Schumacher?!; Amplifiers + Computing Theory Blog; Proven OS Kernel; Mail Based Blog + Gmail; Generating Pie Charts in SQL; More Analysis on the VMWare/Spring Deal; CPU/GPU Unification; VMWare buy SpringSource!; Hardware Entropy Source (USB!); Better Wave Analysis; Older, Happier, Wiser; Analysis (Negative) of Google's Wave; More Info On Concepts; Panasoni'c Micro 4/3 (MFT); Drug Company Ghost-Writes Papers; Linux Disk Config; Blue LEDs on PeeCee07A (PC2500e); Gregory Thielker; Language Workbenches?; Random Art + Cryptography; Initial Impressions - Via C7-D Barebones with Opensuse; Amitai Etzioni; Why are people with "tone-deafness" bad dancers?; DLink DUB-E100, Opensuse; Named Tuples in Python (and some Cairo contexts); Stroustrup's Take; C++ Concepts Dropped; Moved to GMail; Mail-based Blog; System Re-factoring; Enabling speaker beep as KDE notification; UK Police Arrange for Suspect (in UK) to be Tortured (Abroad); Extended to 3D; The Soft Heap: An Approximate Priority Queue with Optimal Error Rate; Godel Prize; Original paper; Facebook / MySpace Social Divide; Only Early Kernels; Cygwin SSH Server on Windows 7 RC; Using a Directory (Package) for Django's Model; Compiling pgplot on opensuse 11.1; Comparison of Dual Core E4700 and E6400; Erik Naggum Dead; Oracle on OpenSuse/Linux; Yup; Olympus Pen EP-1 (Micro 4/3) Details; More Iranian Analysis; Improving Nicotine's Response; Neo4j - a Graph Database; MISC - Lazy Lisp with Maps; Nortec Collective - New Album; The Sorry State of UK Politics; Two Contrary Views on Iran; Some Rape Stats Background; More Overvoltage Results; New Mobo; Caring About Programming Languages; Reflections on First Consultancy Gig; Google Squared; Windows 7 on VirtualBox; Smart File Visualisation; Boomerang - Lenses for Text; Datalog Jobs; RT61 Notes; Remote X for Single Programs; Sorting Morphisms; Computers and Intractability; Although Rather Drinkable; Bugger Carmen and their Grande Vidure; A Bomb Won't Go Off Here; 50 Ways to Change Minds; Sector/Sphere - Distributed Computing on Widespread, Heterogenous Networks; Linux-based Cracker Tools; Dear Esther (Half Life 2 Mod); MySQL Forks; Factor of 2 (Northbridge Explanation v2); A Beginners Guide to Forcing; Tiny STM; Erlang Influence?; CUDA Course; Protocol Support; Axum - New Concurrent Language from MS; Not Quite; 92% Faster; 92% Faster; Overclocking E6400 by 60%; Eight stories on Obama [...] censored from the Guardian, Observer, Telegraph and New Statesman; Trying to Explain why Mercurial is Good; Mandriva; With Eclipse; Add wwwrun to hg group; Writing to Mercurial; Renewing Chilean Visa; Interactive Mode in PEvolve; Using Mercurial on OpenSuse 11.1; Logitech Duet Love; Clarification from Anandtech; Initial Tokenizer Results for LEPL; Dead from beating?; New Edition of Parsing Techniques; The police: Unaccountable, secretive and out of control; Same Guy; 2.3 Released; Another Thought; Caveats; Compiling Recursive Descent to Regular Expressions; Compiling Recursive Descent to Regular Expressions; Much Better via Co-Routines; Much Better via Co-Routines; Much Better via Co-Routines; Much Better via Co-Routines; Peyton Jones - Implementation of Functional Programming Languages; Great Moments in Logic; The Quiet Coup; Logging Slow Queries in MySQL; Dabo - Desktop Application Framework (Python); Epsilon!; Original NFA; Initial DFA Results; Squeezenter on OpenSuse / Linux - Couldn't create command line for ogg playback; Legalising Polygamy in Utah. Ha ha ha.; Implementing a Regular Expression Engine; New Server Configuration; Converting NFA to DFA; Converting NFA to DFA; Converting NFA to DFA; And...; Browser Ball; Auto-layout of Graph Components; Good Article on Poverty in the UK; Does Make Sense; Possibly Complete; Incomplete; PyPy Getting Somewhere?; Corrected Test; I Just Wrote a Regular Exression Engine!; Freaking Awesome YouTube Mixes; Charles Freeman (National Intelligence Council nominee) Statement; 40-fold Speedup in LEPL Parsing; Cities of Bronze and Glass; Cities of Bronze and Glass; Modify Audio with Python; LEPL 2.0 Released; Protocol for copying updated files; Simple LLVM Example - Lisp; MCL - Relatively New Clustering Algorithm?; Fascism now back in Italy?; Declarative (Auckland) GUI Layout; Cybersyn; History of Twentieth-Century Philosophy of Science; Nice Short Summary of Ant v Maven; Yes but no; Sensible Statistics for LHC Risk (Bad News); Simpler Version of Above; Current Economy in Perspective; SSDs Suffer from Fragmentation Issues; LEPL Roadplan; Finally, Clean Main Loop; Simplified Code; Correction on Python Stack; Trampolining Code; Clearer; More on Co-Routines; Transparency Key; Handling Yield; Join The Discussion (Really!); Join the Discussion!; Avoiding the Python Stack; Positive Report on Venezuelan Economy; Papers on Handling Left Recursion in Top-Down Parsers; Works now; Transparent Python Proxy Object for Circular References; Python 3 Instance Attributes as Methods; Alternative Representation; Simple Tree Rewriting; Python Code for ASCII Trees; Natural Language Processing in Python; More Madoff; Recursive Descent Parser; Bria Di Novi; Update; Google Alerts (and LEPL, and setuptools for Python 3); Low Latency(?) Kernel for OpenSuse 11.1; Later; Strange Moderation at BB; Max Richter, Prefix, OpenSuse 11.1; Overview of Python Packaging Tools; Error Handling in Recursive Descent Parsers with Backtracking; A Thought On Obama's Inauguration; The Book; So, the King Of Thailand...; "in" as Operator; Happiness...; Python's Operators; Python 3 in OpenSuse 11.1 and Eclipse; Information on Universe's Event Horizon...; Re: OFF; OFF; TiddlyWiki on Tahoe; Tahoe Least Authority Filesystem / AllMyData.org; More wxPython and OGL; With Bactracking; Syntax; Parsing Credits; New Parser in Python; Food in San Francisco; Updated PPOE Script, Extra Tricks for WebMail; Some Notes on OGL with wxPython; Suspend Broken; Bomb, bomb, bomb...; OpenSuse 11.1 on Lenovo/IBM Thinkpad X60; More Ideas; Gaza; Slice Mechanics; Stupid; Since when did Last.fm start to suck so much?; Rethinking Parsing; Radio David Byrne; Pick of the picks (Guardian photographers) + Internet; Problems with OSX (Apple Mac); Script to convert WMA to MP3 on Linux; Command line player for listening to SqueezeCentre on Linux; Basic HTTP Authentication with XMLRPC in Python; Gaza; Tweaking Beagle and KDE; More on Marcela Moncada; Marcela Moncada at the CCU, Santiago; Schrodinger Book Review; Natanz, not Naratz; Snobol Like Matching in Python; Woman Living in Jeddah; Simple Physics Using Verlet Integration; Updated Raid Data Scrubbing Link; Predictably Irrational; Nuclear Enrichment Technology; Recent DnB; Script to Fix MP3 Directories; Young people and territoriality in British cities; Projections; Cube - Series of Images for Laser Printer; This project died soon after...; And even if you won, you lose :o); Madoff as a Jew; Beagle, Computing in Science and Engineering; Fundacion Rodelillo; Use Logitech Squeeze (Slim Devices); Separate DAC for Headphones; SqueezeCenter/SqueezeNetwork; SqueezeCenter gets better!; Logitech Squeezebox Boom on OpenSuse; Krugman - Absolutely Right; Early Investigation into Madoff; Script to Check for dsl0; Another Positive Assessment of Chile's Position; Slowly making more sense; PPOE on OpenSuse; Quantum Bees

Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next