| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

Last 100 entries

Re: Python's sad, unimaginative Enum; Re: Some explanation; Some explanation; Printing binary trees sideways; Atoms in python; About "Python's sad, unimaginative Enum"; Frustration Understood; Some good feedback here; this is fucking useless; I agree with you #nt; What would be imaginative?; Re: Enum; Enum; Python's sad, unimaginative Enum; Possible Fix; Work, Exhaustion, Vacation; VirtualBox with Centos 6.3 to 6.4, client; Matasano - Programming Lessons Learned; PDF to HTML; Alternate Substitution; Why RSA Works; Trigger; Dreaming of Death; Example: Tracing; Using Coroutines In Protocol Simulations; Python 3.3 Only; Pure Python SHA1 and MD4 Implementations; Ubuntu on VirtualBox; Starting TOR as a service on OpenSuse 12.3; 1001 Albums; Using fail2ban on OpenSuse 12.3; PPPoE on OpenSuse 12.3; Good Article on Unified Physics; It's Police (Carabineros); Linux Software for Listening to and Exploring Music; Android is Pretty Bad; Lucky Number; 3D Printing for Casting; Cover Art for MPDroid; Who'd a thought the French were so bigoted?; PS Input Signal; Small Problem with Roksan K2 Amp; Roksan K2 Amp + ATC SCM7 Speakers; Do What Makes Sense; Re: Arguing About Tests, Still; Arguing About Tests, Still; Images; Good Article on NY Drummers; Related Bug Report; Getting Python 3.3 and Virtualenv Working in OpenSuse 12.3; How I Am; Awesome video about digital audio; The Difference Between Dimensional and Normalized Databases; The rise of the new Chinese bogeyman; Updated Syntax; Very First Steps to C-ORM; The Ideal User Interface For Music Exploration; Can The Republicans Be Saved?; Rate Limiting Calls to EchoNest; Mods to Cache; Comparing UYKFG and UYKFD/E/F; Someone Else is Concerned; EchoNest-based Playlist Generator for MPD; Example Voting Results; A Heavyweight Python Cache; Identifying Artists with EchoNest; Notes on Pregalex / Pregabalina / Lyrica; The Neil Cowley Trio; Drake - Make for Data; A Reliable Python Web Service; Useful Python Date/Time Library?; Need to Sleep, But this is Good; Command Line Set Difference; Little Details...; Linux Command Line Tricks; AutoTools Tutorial; Hangman Tactics; A Tor Proxy Embedded In A Web Page; Tree (Nested Dicts) in Python; Sleeping at Parties; I Know Someone Who Hurts Other People; Light and Tea; Description of the LCS35 Time Capsule Crypto-Puzzle; Re: I can relate to that ...; I can relate to that ...; Re: It's 2012 Why Does My IDE Suck?; My Own Alternative Medicine; Nice explanation of SVM; Why and How Writing Crypto is Hard; Re: It's 2012 Why Does My IDE Suck?; Incremental Regular Expressions; BBC Map Confused at Pole; Social Media: Ground Zero in the Culture War; My Visit to the Psycho Doc; Learning Modern 3D Graphics Programming; Hope you got some crackers to go with the cheese; Re: But how easy would it be ...; But how easy would it be ...; Powerline Freq Fingerprinting of Audio; The Folly of Scientism; Cheese - Because You're Going to Die Anyway

© 2006-2013 Andrew Cooke (site) / post authors (content).

Optimising Clojure

From: andrew cooke <andrew@...>

Date: Wed, 31 Aug 2011 08:54:22 -0300

I had some Clojure code that needed optimising.  Eventually I achieved a 5x
speedup.  This is how.


Finding Something to Measure
----------------------------

First, I needed something to measure, so that I could tell how effective my
changes were.  This was complicated by three things: the need to generate test
data to process; the delayed action of the JIT; garbage collection.

Generating test data is quite expensive, so I needed to carefully tune
parameters so that my processing used more time than the test data generation
(not necessary for the simple timings, but important for profiling, since I
could find no simple way to delay profiling to start after the data was
ready).  Then I placed the processing in a loop, repeating it multiple times
(re-using the same test data).  Each pass through the loop was timed - the
successive times let me see when the JIT engaged (typically the first
calculation was 50% slower) and also reduce the effect of GC by selecting the
lowest value.


Profiling
---------

Once I had a good test running in my IDE, I needed to make standalone Java
classes to profile.  The simplest way to do this seems to be to use a build
tool called leiningen - https://github.com/technomancy/leiningen - which was
very easy to use.  You just download and run it, create an empty project, and
modify the project descriptor file (which I then copied into my existing
project).

My file contents are:

  (defproject compressive "0.0.0"
    :description "set-related experiments"
    :aot [#"com.isti.compset.*"]
    :main com.isti.compset.stack
    :dependencies [[org.clojure/clojure "1.3.0-beta1"]
		   [org.clojure/clojure-contrib "1.2.0"]])

where:

  - aot indicates which classes need compiling
  - main indicates the location of the "main" file
  - dependencies are downloaded automatically

I also added the (-main) method and put (:gen-class) in my ns at the top of
the file.

With all that, "lein compile" generated a set of classes that could be run
with java:

  java -cp classes:lib/clojure-1.3.0-beta1.jar \
    -agentlib:hprof=cpu=samples,depth=10,file=hprof com.isti.compset.stack

giving a profile in the file "hprof".

This profile was dominated by expected methods related to generating
and consuming sequences, and adding and multiplying generic numbers.


Vector of Doubles
-----------------

The central loop in my code sums various spectra and the square of their
values.  The spectra were stored as "vec" instances.  I hope to remove the
use of generic numbers by replacing this with a vector of doubles and, if
necessary, adding some type annotations to various functions.

So I replaced "vec" with

  (defn d-vec [collection]
    (apply conj (vector-of :double) collection))

However, this did not have the desired effect - the code was
significantly slower and the trace was dominated by calls to "Vec.count".

I asked for help on stack overflow http://stackoverflow.com/q/7223297/181772
but didn't get any very useful response, so I removed these changes.


Array of Doubles
----------------

At this point I was a little disheartened.  The traces didn't have much useful
information (that I could see, even using a nice little tool called PerfAnal
http://java.sun.com/developer/technicalArticles/Programming/perfanal/).

So I looked at my inner loop and decided that I would simply rewrite it as I
thought it should be written, relying on experience rather than profiling.  I
also found this page http://clojure.org/java_interop on the use of arrays
helpful.

I made the following changes:

 - replace the vector in which I was accumulating results with a native array
   of doubles (using double-array)

 - mutate the accumulator rather than creating a new instance each loop

 - use additional indices to identify the "active range" of the accumulator
   (which may get smaller over time, due to restricted data ranges in
   spectra).    This had not been necessary when I was generating new
   immutable instances - the instances simply decreased in size.

 - change the spectra from vectors to arrays.

 - replace some maps and reduces with explicit loops that mutate data.

This was, initially a complete disaster.  Running time increased by roughly a
hundredfold.


Type Annotations
----------------

Rather than reverting these changes, which seemed justified from experience, I
looked again at profiling.  The profile was now dominated by methods related
to java introspection.

This is a known issue with Clojure and has a simple solution - add

  (set! *warn-on-reflection* true)

and remove all warnings by adding type hints.  I did not enable this for all
code - only for the central loop and related functions.  And I used the
"^doubles" hint which indicates a native array of doubles.

This, finally, gave a significant speedup - about 5x faster than the initial
code.


Conclusions
-----------

 - Clojure code is simplest, and reasonably efficient, when you stay within
   Clojure's own types (vec, seq etc).

 - Profiling with hprof can give broad information, but I, at least, couldn't
   understand it in detail (I normally can read profiling data!) - there
   seemed to be information missing, or I simply do not understand how Clojure
   works.

 - It was fairly simple to change the code to use typed, mutable data in the
   inner loop, which gave significant speedups.

 - Dynamic interop with Java is apallingly slow, but easy to fix once you see
   that it is happening.

 - More generally, Clojure works fairly well for exploratory data processing.
   It's nicely high-level and easy to work with, and can be optimised where
   necessary.  I have also implemented this particular algorithm in C and on
   GPUs, and I am sure it is faster in both cases, but (without doing any
   formal measurements) the code feels close enough to C for me to experiment
   with new ideas quickly.  I doubt this would have been possible in Python,
   for example (even with numpy - the inner loop is annoyingly hard to
   vectorize)

More on (vector-of :double)

From: andrew cooke <andrew@...>

Date: Wed, 31 Aug 2011 22:50:48 -0300

Justin Kramer at http://stackoverflow.com/q/7223297/181772 sugegsted using 

  (into (vector-of :double) collection)

to create vectors of doubles.  This seemed like a good idea for the immutable
spectra (it's probably not clear above but I introduced arrays of soubles in
two places - the mutable accumulator used to sum spectra, and the indvidual
spectra themselves).  However, when I tried it, I found that there was no way
to add type hints for these vectors.  That led to warnings of dynamic code
when the contents were used.

So using arrays of doubles is heping in two ways - it's supporting fast,
mutable access and also helping the compiler infer (from the ^doubles hint for
the array) the tyoe of the content.

Andrew

PS I posted this on HN where it gained some view, but no useful comments.

Comment on this post